content discovery through entity driven search
TRANSCRIPT
![Page 1: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/1.jpg)
ECIR 2014 Industry DayContent Discovery Through Entity Driven Search
Alessandro Benedettihttp://uk.linkedin.com/in/alexbenedetti
Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales16th April 2014
![Page 2: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/2.jpg)
• Experienced at building and delivering a wide range of enterprise solutions across the whole information life cycle
• Alfresco & Ephesoft certified Platinum Partner
• Red Hat Enterprise Linux Ready Partner
• Crafter & Varnish Gold Partners
• Search Solutions ConsultantAlfresco Partner of the Year 2012 and
2013
![Page 3: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/3.jpg)
Working effectively together
Who We Are
3
Antonio David Pérez Morales
- R&D Senior Engineer- Master in Engineering and Technology Software- Digital Identity and Security expert- Enterprise Search Background- Semantic, NLP, ML Technologies and Information Retrieval lover- Apache Stanbol Committer- Apache contributor
@adperezmoraleshttp://es.linkedin.com/in/adperezmorales/
Alessandro Benedetti
- R&D Senior Engineer- Master in Computer Science- Information Retrieval background-- Enterprise Search specialist- Semantic, NLP, ML Technologies and Information Retrieval lover
@AlexBenedettihttp://uk.linkedin.com/in/alexbenedetti
![Page 4: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/4.jpg)
Working effectively together
Agenda
4
• Context
• Problem
• Solution
• Demo
• Future Works
![Page 5: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/5.jpg)
Working effectively together
Agenda
5
• Context
• Problem
• Solution
• Demo
• Future Works
![Page 6: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/6.jpg)
Working effectively together
Zaizi R&D Department
6
•Giving sense to the content
• Enriching it semantically
•Adding value to ECM/CMS
• More structured content, easy to manage, link and search,
•Improving search
• Across different domains, data sources, User Experience
• Machine Learning applied research
• Content Organization – Recommendation Systems
![Page 7: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/7.jpg)
Working effectively together
Agenda
7
• Context
• Problem
• Solution
• Demo
• Future Works
![Page 8: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/8.jpg)
Working effectively together
Enterprise Search Problems
8
Challenge : Search within Big and Heterogeneus Repositories
• Heterogeneus Data Sources
• Filesystem, DB, ECM/CMS, Email, …
• Unstructured Content
• PDFs, text plain, Word, …
• Documents not linked between each other
• Federated Search needed
• Search across data sources
• Different permissions
• Centralized endpoint
![Page 9: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/9.jpg)
Working effectively together
Current Enterprise Search Weaknesses
9
• Keyword based
• Low precision
• Ambiguous terms not in context
• Not accurate weighting when keywords are combined in a query
![Page 10: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/10.jpg)
Working effectively together
Agenda
10
• Context
• Problem
• Solution
• Demo
• Future Works
![Page 11: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/11.jpg)
Working effectively together
Entity Driven Search
11
• Moves from keywords to Entities
•More understandable to a Human
• Process the unstructured text
• Enrich it
• Build specific indexes
• Use entities and concepts in searches
![Page 12: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/12.jpg)
Working effectively together
Sensefy
12
• Semantic Enterprise Search Engine
• Federated Search
• Evolved User Experience
• Based on cutting-edge Open Source Frameworks
![Page 13: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/13.jpg)
Working effectively together
Architecture
13
![Page 14: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/14.jpg)
Working effectively together
RedLink
14
• Semantic Cloud platform
• Providing Software as a Service
• Manage unstructured data
• Extract knowledge and intelligence
• Make sense of information
• Feed into business processes
• Open-Source based components
• Entity Linking using Knowledge Bases
![Page 15: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/15.jpg)
Working effectively together
NLP & Semantic Enrichment
15
• From unstructured to structured
• NLP Analysis. POS Tagging
• Named Entities Recognition
• Linked Data
• Entity Linking using Knowledge Bases
• Disambiguation
• Indexing in Solr
![Page 16: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/16.jpg)
Working effectively together
Smart Autocomplete
16
• Multi Phase suggestions
• Closer to natural language query formulation
• Named Entities infix
• Entity types infix
• Multi Language entity type support
• Properties driven query approach
![Page 17: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/17.jpg)
Working effectively together
Smart Autocomplete Configuration
17
• Entity type properties
• Interesting to our use case and scenario
• Properties inheritance through type hierarchy
• Enhance type information from external resource
•Freebase, DbPedia , Custom Data Set
![Page 18: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/18.jpg)
Working effectively together
Semantic Search
18
• Search by Named Entity
• Search by Entity Type
• Search by Entity Type properties
• Grouping Results by Sense
• Contextualize Results Using Semantic Information
![Page 19: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/19.jpg)
Working effectively together
Semantic More Like This
19
• Search for Similar Documents based on Entities and Entities’ categories
• Similarity Function based on Documents’ Sense
• Not based on text tokens
• Entity Frequency / Inverted Document Frequency
• Entity Type Frequency / Inverted Document Frequency
![Page 20: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/20.jpg)
Working effectively together
Agenda
20
• Context
• Problem
• Solution
• Demo
• Future Works
![Page 21: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/21.jpg)
Working effectively together
Agenda
21
• Context
• Problem
• Solution
• Demo
• Future Works
![Page 22: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/22.jpg)
Working effectively together
Future Work
22
• Semantic More Like This new approach (Graph relations)
• Machine Learning components: Classification, Topic annotation, Clustering
• Semantic facets
• Secured Entity Search
• Image and Media searches
![Page 23: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/23.jpg)
Working effectively together
Conclusions
23
• Better user experience
• More precision in search results
• Closer to human language
![Page 24: Content Discovery Through Entity Driven Search](https://reader030.vdocuments.net/reader030/viewer/2022021506/588312561a28ab31068b5ae3/html5/thumbnails/24.jpg)
Zaizi HeadquartersBrook House4th Floor, North Wing229-243 Shepherd’s Bush RoadLondon W6 7ANUnited KingdomT: (+44) 20 3582 8330 Zaizi IberiaCalle Gremios 13-15, Edificio DiseñoPlanta 1, Oficina 541927 Mairena del Aljarafe SevillaSpainT: (+34) 666 42 43 64 Zaizi Asia50 Flower RoadColombo 07Sri LankaT: (+94) 112 301 461 Zaizi Singapore14 Robinson Road #13-00Far East Finance BuildingSingapore 048545T: (+65) 3158 5886F: (+65) 6323 1839
VAT Registration No GB 932 8855 89Registered in England and Wales with registration number 6440931
www.zaizi.com
Thanks!