multilingual search system

16
Multilingual Search System TEAM NAME –SHIELD Vamshi Krishna Padidela(50169645) Manikant Manohar Kapuganti(50170071) Pramod Rangaraju(50169514) Sudheer Bondada(50170321) Nikhil Ayyagari(50169485)

Upload: manikant

Post on 28-Jan-2016

219 views

Category:

Documents


0 download

DESCRIPTION

Multilingual search system as part of Information Retrieval. The presentation deals with the implementation of a search system using Solr.

TRANSCRIPT

Page 1: Multilingual Search System

Multilingual Search System

TEAM NAME –SHIELD

Vamshi Krishna Padidela(50169645)

Manikant Manohar Kapuganti(50170071)

Pramod Rangaraju(50169514)

Sudheer Bondada(50170321)

Nikhil Ayyagari(50169485)

Page 2: Multilingual Search System

Introduction

In this project, we built a retrieval system powered by Solr to search within tweets.

The dataset includes 11,000 tweets(multiple languages) consumed using the Twitter’s REST API. The tweets belong to two sets of topics isis and health with significant sub topics in each.

The UI for the search system is built on banana framework which has powerful dashboard capabilities to visualize big data analytics.

Page 3: Multilingual Search System

We have implemented below components

1. Content Tagging (Monolingual)

2. Faceted Search

3. Cross-Document Analytics

4. Topic Models and/or LSI

Page 4: Multilingual Search System

Content Tagging (Monolingual)

We realized content tagging using Alchemy’s Entity Extraction API.

The Alchemy API identifies proper nouns(places, people, organizations) using Natural Language Processing.

The tags for each tweet returned by the Alchemy API is added to the respective tweet using another field “tags”.

The new JSON file with the added “tags” is re-indexed in Solr.

The tags give insights into interesting metrics like popularity of a person, place etc over a period of time.

Page 5: Multilingual Search System

Results from Alchemy API’s content tagging

Page 6: Multilingual Search System

Tags for a search field

Page 7: Multilingual Search System

The tags displayed in the order of most used

Page 8: Multilingual Search System

Faceted Search

Faceted Search is available with banana framework where the search can be limited based on the fields like text, language, location and etc.

The functionality of facets are similar to filters with added document count.

Faceted search helps displaying dashboards for various analytical purposes.

Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search.

Page 9: Multilingual Search System

Facets and filters

Page 10: Multilingual Search System

Pie chart showing the geographical distribution

Page 11: Multilingual Search System

Cross Document Analytics

Page 12: Multilingual Search System

Distribution of tweets against time and location

Page 13: Multilingual Search System

Topic Models-LSI

Implemented Latent Semantic Indexing(LSI) on the data collected to demonstrate semantic search instead of keyword search.

Latent Dirichlet Allocation (LDA) is an initial probabilistic extension of the LSI technique.

LDA is responsible for extraction of collections of topics.

LDA processes tweets in order to find the topic distribution fro each document and also the document distribution for each topic.

The LDA algorithm is invoked on the vectors generated from the Sequence file.

We are using MALLET(Machine Learning for Language Toolkit) for topic generation.(Results pending)

Page 14: Multilingual Search System

Search System UI – 1/2

Page 15: Multilingual Search System

Search System UI – 2/2

Page 16: Multilingual Search System

Thank You!!