multilingual search system

Multilingual Search System TEAM NAME –SHIELD Vamshi Krishna Padidela(50169645) Manikant Manohar Kapuganti(50170071) Pramod Rangaraju(50169514) Sudheer Bondada(50170321) Nikhil Ayyagari(50169485)

Upload: manikant

Post on 28-Jan-2016

219 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

Multilingual search system as part of Information Retrieval. The presentation deals with the implementation of a search system using Solr.

TRANSCRIPT

Multilingual Search System

TEAM NAME –SHIELD

Vamshi Krishna Padidela(50169645)

Manikant Manohar Kapuganti(50170071)

Pramod Rangaraju(50169514)

Sudheer Bondada(50170321)

Nikhil Ayyagari(50169485)

Introduction

In this project, we built a retrieval system powered by Solr to search within tweets.

The dataset includes 11,000 tweets(multiple languages) consumed using the Twitter’s REST API. The tweets belong to two sets of topics isis and health with significant sub topics in each.

The UI for the search system is built on banana framework which has powerful dashboard capabilities to visualize big data analytics.

We have implemented below components

1. Content Tagging (Monolingual)

2. Faceted Search

3. Cross-Document Analytics

4. Topic Models and/or LSI

Content Tagging (Monolingual)

We realized content tagging using Alchemy’s Entity Extraction API.

The Alchemy API identifies proper nouns(places, people, organizations) using Natural Language Processing.

The tags for each tweet returned by the Alchemy API is added to the respective tweet using another field “tags”.

The new JSON file with the added “tags” is re-indexed in Solr.

The tags give insights into interesting metrics like popularity of a person, place etc over a period of time.

Results from Alchemy API’s content tagging

Tags for a search field

The tags displayed in the order of most used

Faceted Search

Faceted Search is available with banana framework where the search can be limited based on the fields like text, language, location and etc.

The functionality of facets are similar to filters with added document count.

Faceted search helps displaying dashboards for various analytical purposes.

Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search.

Facets and filters

Pie chart showing the geographical distribution

Cross Document Analytics

Distribution of tweets against time and location

Topic Models-LSI

Implemented Latent Semantic Indexing(LSI) on the data collected to demonstrate semantic search instead of keyword search.

Latent Dirichlet Allocation (LDA) is an initial probabilistic extension of the LSI technique.

LDA is responsible for extraction of collections of topics.

LDA processes tweets in order to find the topic distribution fro each document and also the document distribution for each topic.

The LDA algorithm is invoked on the vectors generated from the Sequence file.

We are using MALLET(Machine Learning for Language Toolkit) for topic generation.(Results pending)

Search System UI – 1/2

Search System UI – 2/2

Thank You!!

Keys to Building a Multilingual Search Engine Thierry Sourbier

Search Engine for Multilingual Audiovisual Contents

A Tajik Extension of the Multilingual Information Extraction System

Enabling Multilingual Search through Controlled ...Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach Fabrizio Celli1,*, Johannes Keizer1 1 Food and Agriculture

Google’s Multilingual Neural Machine Translation …ocw.snu.ac.kr/sites/default/files/NOTE/IML_Lecture (03).pdfGoogle’s Multilingual Neural Machine Translation System: Enabling

in Solr Multilingual Searchdata-con.org/wp-content/uploads/2014/09/David-Troiano...Approaches to multilingual search in Solr A Multilingual Search Example The Goal Build a search engine

CLIR-Based Collaborative Construction of Multilingual ... · A user will deal with the multilingual dictionary and Solr search engine through a collaborative environment that makes

Multilingual search Tech with PanImagesturing.cs.washington.edu/PanImMultilingual.pdf · additional bilingual and multilingual dictionaries, particularly dictionaries for languages

From text to truth real world facets for multilingual search

MultiMatch - Providing Multilingual/Multimedia Access to ... · Provide functionalities that allow users to •search across languages (query translation) •search for web pages,

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Recycling Lingware in a Multilingual MT System - DiVA …liu.diva-portal.org/smash/get/diva2:1079643/FULLTEXT02.pdf · Recycling Lingware in a Multilingual MT System Manny Rayner

Show Tell: how do teachers search and multilingual tagging

Multilingual Scene Character Recognition System using

A prototype system for multilingual data discovery of ... · A prototype system for multilingual data discovery of International Long-Term Ecological Research (ILTER) Network data

Optimizing multilingual search in SOLR

Google’s Multilingual Neural Machine Translation System ... · PDF fileGoogle’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation MelvinJohnson,MikeSchuster,QuocV.Le,MaximKrikun,YonghuiWu,

Multilingual Search and Text Analytics with Solr - Open Source Search Conference

SINAI-GIR A Multilingual Geographical IR System

Optimizing Multilingual Search: Presented by David Troiano, Basis Technology

Khresmoi – Multilingual Semantic Search of Medical Text and Images Henning Müller Allan Hanbury

Challenges in building multilingual multidirectional lexical search · 2010-03-24 · Challenges in building multilingual multidirectional lexical search - the case of Nyishi-Bangla-English

PATENTSCOPE search system: Advanced search

Multilingual Search Marketing Industry Updates - Dec 2013

M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide S.A. (PL)

Multilingual Ontologies for Cross-Language Information Extraction and Semantic Search

Multilingual Search and Text Analytics with Solr - Steve Kearns

Multilingual System for Web-Information: The State …hansu/ltslides06.pdfCombination of information extraction and multilingual generation Make database information multilingual available

EROS: An Open Source Multilingual Research System for ...EROS: An Open Source Multilingual Research System for Image Content Retrieval dedicated to Conservation-Restoration exchange

Multilingual Search Engine Marketing

The Architectural Design of a System for Interpreting Multilingual …€¦ · The Architectural Design of a System for Interpreting Multilingual Web Documents in E-speranto Grega

Simultaneous Multilingual Search for Translingual Information Retrieval

Google’s Multilingual Neural Machine Translation System ... · Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation MelvinJohnson,MikeSchuster,QuocV.Le,MaximKrikun,YonghuiWu,

The Porphyry System Applied to Multilingual Urban Ontologies

M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide-Matrix S.A. (PL)