context sensitive ppt

Upload: vinod-kumar

Post on 04-Jun-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Context Sensitive Ppt

    1/29

    SPELL CHECKERBASIC AND CONTEXT

    SENSITIVE

    By:

    K Satish Kumar ( 07131A0544 )

    B Krishna Chaitanya (07131A0547 )

    K Viswa Sai Raja ( 07131A0546)

  • 8/13/2019 Context Sensitive Ppt

    2/29

    SPELL CHECKER:

    A spell checkeris an application program that flags

    words in a document that are not be spelt correctly.

    Our application provides a method of correction of

    misspelled and confused words in a phrase written in anatural language.

    The application can offer several words as choice

    words for inserting into the passage to replace the

    unrecognized word.

  • 8/13/2019 Context Sensitive Ppt

    3/29

    The kind of errors which result due to the absence of the

    typed word in the dictionary are known as non-word

    errors.

    These kind of errors can be detected and corrected

    using basic spell checking capabilities.

    Examples: ths instead of this,

    spel instead of spell

    Basic Spell Checking

  • 8/13/2019 Context Sensitive Ppt

    4/29

    BASIC SPELL CORRECTION APPROACH:

    In order to perform basic spell checking, first we construct a trie with all the words that

    are present in a dictionary. A dictionary is nothing but sequence of words in a text file.

    After the trie is constructed then the given text which is either a single sentence or a

    group of sentences is split into words. Then every word is searched for its presence inthe trie.

    If any word is not found it is added to the misspelling list.

    The suggestions to the words in this list are provided using edit distance criteria and

    phonetic distance criteria.

    In order to provide suggestions based on phonetic distance we are using a Class

    called Double Metaphone from the package commons-codec-1.3provided by apache

    software foundation group.

  • 8/13/2019 Context Sensitive Ppt

    5/29

    HOW OUR SPELL CHECKER IS DIFFERENT FROM REGULAR

    SPELL CHECKER???

    I saw TREI trees in the parkINPUT

    REGULAR

    SPELL

    CHECKER

    I saw [ TREE | TREK ] trees in the park

  • 8/13/2019 Context Sensitive Ppt

    6/29

    INPUT I saw TREE trees in the park

    CONTEXTSENSITIVE

    SPELL

    CHECKER

    I saw THREE trees in the park

  • 8/13/2019 Context Sensitive Ppt

    7/29

    Recently, research has focused on developing algorithms which are capableof recognizing a misspelled word, even if the word itself is in the vocabulary,

    based on the context of the surrounding words.

    The detection and correction of spelling mistakes that result in real words of

    the target language, also known as real word spell checking, is the mostchallenging task for a spell checking system.

    However, the majority of those systems are not able to catch the kind of

    errors such as in Let us meat today (meat was typed when meet was

    intended). This kind of spell checking is known as Context sensitive spell

    checking.

    Indeed, empirical studies have estimated that errors resulting in valid words

    account from 25% to more than 50% of the errors, depending on the

    application.

    Context Sensitive Spell Checking

  • 8/13/2019 Context Sensitive Ppt

    8/29

    Context Sensitive Spell Check Approach:

    In order to perform context based spell checking we are

    taking the help of a search engine. It can be of Google or

    Yahoo! or Bing or any other which allows to access the search

    results of the query through an API.

    Yahoo! provides the users an api through which we can give

    unlimited number of queries once we have registered with

    Yahoo!! BOSS. So finally we are using the search power of

    Yahoo!.

    Yahoo! Search BOSS (Build your Own Search Service) is aninitiative in Yahoo!! Search to open up Yahoo!!'s search

    infrastructure and enable third parties to build revolutionary

    search products leveraging their own data, content, technology,

    social graph, or other assets.

  • 8/13/2019 Context Sensitive Ppt

    9/29

    In this project, we send requests to the Yahoo! Boss Web

    Service to find the possible real word error in the given sentence.

    Consider the following sentence,

    Let us meat today

    The above sentence will be sent to the Yahoo! web server in the following

    formats.

    * us meat today

    Let * meat today

    Let us * today

    Let us meat *

    Context Sensitive Spell Check Approach:

  • 8/13/2019 Context Sensitive Ppt

    10/29

    HOW TO USE THE YAHOO SERVICE:

    The Yahoo! web server returns the result count for each sentence

    sent. Basing on the number of results received from the web server,

    we estimate the possible real word in the given sentence.

    After the error has been detected, we generate suggestions basing on

    features such as Edit Distance and Phonetic Distance.

    But during the testing phase of the spell checking application, we

    stored the most likely confused words, so that we need not consider the

    above features and check with the most likely confused pair of the word

    itself.

  • 8/13/2019 Context Sensitive Ppt

    11/29

    Yahoo BOSS Application ID

  • 8/13/2019 Context Sensitive Ppt

    12/29

    MAIL FEATURE:

    JavaMail is a Java API used to receive and send email via SMTP,

    POP3 and IMAP. JavaMail is built into the Java EE platform, but also

    provides an optional package for use in Java SE.

    The JavaMail API provides a platform-independent and protocol-

    independent framework to build mail and messaging applications.

    In our project, we are providing the users with an option to send the

    Spell Checked text to the users mail account. We use the JavaMail

    API to send the text content to the mentioned Email Address.

  • 8/13/2019 Context Sensitive Ppt

    13/29

    USE CASE DIAGRAM:

  • 8/13/2019 Context Sensitive Ppt

    14/29

    CLASS DIAGRAM:

  • 8/13/2019 Context Sensitive Ppt

    15/29

    ACTIVITY DIAGRAM:

  • 8/13/2019 Context Sensitive Ppt

    16/29

    SEQUENCE DIAGRAMS:

    1.USERAPPLICATION:

  • 8/13/2019 Context Sensitive Ppt

    17/29

    2.USERAPPLICATION - WEBSERVICE:

  • 8/13/2019 Context Sensitive Ppt

    18/29

    3.USERAPPLICATION (MAIL):

  • 8/13/2019 Context Sensitive Ppt

    19/29

    SCREEN SHOTS:

  • 8/13/2019 Context Sensitive Ppt

    20/29

    OPENFILE DIALOGUE:

  • 8/13/2019 Context Sensitive Ppt

    21/29

    OPENED DOCUMENT:

  • 8/13/2019 Context Sensitive Ppt

    22/29

    REPLACING MISSPELLINGS:

  • 8/13/2019 Context Sensitive Ppt

    23/29

    WHAT IF NO SUGGESTION FOUND:

  • 8/13/2019 Context Sensitive Ppt

    24/29

    ADD TO DICTIONARY:

  • 8/13/2019 Context Sensitive Ppt

    25/29

    Context Sensitive Spell

    Checking

  • 8/13/2019 Context Sensitive Ppt

    26/29

    MAIL FEATURE:

  • 8/13/2019 Context Sensitive Ppt

    27/29

    REQUEST DETAILS DIALOGUE:

  • 8/13/2019 Context Sensitive Ppt

    28/29

    Mail Received

  • 8/13/2019 Context Sensitive Ppt

    29/29

    CONCLUSION:

    This spell checker can be used when we need a rigorous checking ofour text (like when sending the document to higher officials etc.)

    The Yahoo! BOSS API does not permit large number of requests inshort time. For this purpose we used a delay of 9 sec betweenconsecutive requests. This is to be reduced.

    Besides this the result of our question greatly depends on the searchresults from search engine. Sometimes the required pattern may not befound in the search result.

    So the future enhancements can be made such as using our owndatabase. A Database size of about 200GB can be made with the helpof Google trigram datasets and match the sentence against the trigramsto find out the central word and offer the suggestions based on featuressuch as edit and phonetic distances.