lexical and statistical semantics in professional search · 2018-10-04 · similarity based on...
TRANSCRIPT
![Page 1: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/1.jpg)
Lexical and Statistical Semantics in
Professional Search
Allan Hanbury
![Page 2: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/2.jpg)
![Page 3: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/3.jpg)
![Page 4: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/4.jpg)
![Page 5: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/5.jpg)
1. A method for scrolling through portions of a data set, said method comprising: receiving a number of units associated with a rotational user input; determining an acceleration factor pertaining to the rotational user input; modifying the number of units by the acceleration factor; determining a next portion of the data set based on the modified number of units; and presenting the next portion of the data set.
2. A method as recited in claim 1, wherein the data set pertains to a list of items, and the portions of the data set include one or more of the items.
3. A method as recited in claim 1, wherein the data set pertains to a media file, and the portions of the data set pertain to one or more sections of the media file.
4. A method as recited in claim 3, wherein the media file is an audio file.
5. A method as recited in claim 1, wherein the rotational user input is provided via a rotational input device.
6. …
Claims
![Page 6: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/6.jpg)
![Page 7: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/7.jpg)
photo: wocintechchat.com
![Page 8: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/8.jpg)
![Page 9: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/9.jpg)
Search Systems
Context Technology
• People• Tasks• Documents• …
• Natural Language Processing
• Data-drivenapproaches
• …
![Page 10: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/10.jpg)
Natural Language Processing
This is particularly preferable in the case of liquid injection molding (LIM).
This/DT is/VBZ particularly/RB preferable/JJin/IN the/DT case/NN of/IN liquid/JJinjection/NN molding/NN (LIM/NN).
![Page 11: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/11.jpg)
Population Intervention Comparison
PIC Detection
![Page 12: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/12.jpg)
Population Intervention Comparison
PIC Detection
![Page 13: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/13.jpg)
![Page 14: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/14.jpg)
Less Effective More Effective
Effectiveness of Asthma Therapies
![Page 15: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/15.jpg)
2009
![Page 16: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/16.jpg)
Data-Driven
P (Word1 | Word2)
![Page 17: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/17.jpg)
![Page 18: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/18.jpg)
“Every time I fire a linguist, the performance of the speech recognizer goes up.”
Frederick Jelinek, IBM
Around 1985
![Page 19: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/19.jpg)
![Page 20: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/20.jpg)
Bag of Words
CC image courtesy of Flickr/surrealmuse
![Page 21: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/21.jpg)
“In most cases, the meaning of a word is its use.”
Ludwig Wittgenstein, Philosophical Investigations (1953)
![Page 22: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/22.jpg)
A bottle of tesgüino is on the table.
Everybody likes tesgüino.
Tesgüino makes you drunk.
We make tesgüino out of corn.
Example from J. R. Firth
![Page 23: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/23.jpg)
Statistical semantics extracts lexical information from the statistics of large amounts of text
![Page 24: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/24.jpg)
Word Embedding
http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
![Page 25: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/25.jpg)
Input Layer
Output Layer
Hidden Layer
~104−105
inputs
~102 nodes
#outputs= # inputs
![Page 26: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/26.jpg)
![Page 27: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/27.jpg)
What is more similar to a typewriter:(a) a bird, or (b) a snowball?
![Page 28: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/28.jpg)
Top k
books 0.82
foreword 0.77
author 0.74
published 0.73
preface 0.69
republished 0.68
reprinted 0.68
afterword 0.67
memoir 0.67
book
corpulent 0.44
hideous 0.43
unintelligent 0.42
wizened 0.42
catoblepas 0.42
creature 0.42
humanoid 0.41
grotesquely 0.41
tomtar 0.41
dwarfish
Similarity Threshold
Threshold
Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Guido Zuccon, Exploration of a Threshold for
Similarity based on Uncertainty in Word Embedding. In Proceedings of the European
Conference on Information Retrieval Research (ECIR 2017)
Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Uncertainty in Neural Network Word
Embedding: Exploration of Threshold for Similarity. Proc. Neu-IR Workshop of the ACM
Conference on Research and Development in Information Retrieval (NeuIR-SIGIR 2016)
Similarity Threshold
![Page 29: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/29.jpg)
Word Embedding in Search
knowledge
knowledge
knowledge
wisdom
understanding
knowledge
knowledge
0.7 × knowledge
0.8 × knowledge
![Page 30: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/30.jpg)
Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Guido Zuccon, Generalizing Translation Models in the Probabilistic Relevance Framework,Proceedings of ACM International Conference on Information and Knowledge Management (CIKM 2016)
MA
P Im
pro
vem
ent
![Page 31: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/31.jpg)
CLEF-IP patent collectionResults for Patent Search
![Page 32: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/32.jpg)
Multi-Word Terms in Patents
coating method
memory data processorcomplex programmable logic devicerear cross frame memberdocument editing device
Document Length
Sentence Complexity
New Terminology
![Page 33: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/33.jpg)
Search Systems
Context Technology
Evaluation
![Page 34: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/34.jpg)
Acknowledgements
Linda Andersson, Alexandros Bampoulidis, Tobias Fink, Aldo Lipani, Mihai Lupu, João Palotti, Florina Piroi, Navid Rekabsaz, Abdel Aziz Taha, Markus Zlabinger
![Page 35: Lexical and Statistical Semantics in Professional Search · 2018-10-04 · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information](https://reader034.vdocuments.net/reader034/viewer/2022050305/5f6d80d46aaa7708bb13c676/html5/thumbnails/35.jpg)
Univ.-Prof. Dr. Allan Hanbury
Institute for Information Systems Engineering
TU Wien
Favoritenstraße 9-11/194-04
1040 Vienna
Austria
Telephone: +43 1 58801 188310
Mobile: +43 676 978 0991
e-Mail: [email protected]