computational linguistics a brief overview. computational linguistics might be considered as a...

19
Computational linguistics A brief overview

Upload: georgiana-golden

Post on 05-Jan-2016

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

Computational linguistics

A brief overview

Page 2: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

Computational Linguistics

• might be considered as a synonym of automatic processing of natural language, since the main task of computational linguistics is just the construction of computer programs to process words and texts in natural language.

Page 3: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

Purpose

• AUTOMATIC HYPHENATION• Hyphenation is intended for the proper

splitting of words in natural language texts. When a word occurring at the end of a line is too long to fit on that line within the accepted margins, a part of it is moved to the next line. The word is thus wrapped, i.e., split and partially transferred to the next line.

Page 4: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since
Page 5: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

SPELL CHECKING

• The objective of spell checking is the detection and correction of typographic and orthographic errors in the text at the level of word occurrence considered out of its context.

• Nobody can write without any errors. Even people well acquainted with the rules of language can, just by accident, press a wrong key on the keyboard (maybe adjacent to the correct one) or miss out a letter. Additionally, when typing, one sometimes does not synchronize properly the movements of the hands and fingers. All such errors are called typos, or typographic errors. On the other hand, some people do not know the correct spelling of some words, especially in a foreign language. Such errors are called spelling errors.

Page 6: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since
Page 7: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

GRAMMAR CHECKING• Detection and correction of grammatical errors by taking into

account adjacent words in the sentence or even the whole sentence are much more difficult tasks for computational linguists and software developers than just checking orthography.

• Grammar errors are those violating, for example, the syntactic laws or the laws related to the structure of a sentence. In Spanish, one of these laws is the agreement between a noun and an adjective in gender and grammatical number. For example, in the combination *mujer viejos each word by itself does exist in Spanish, but together they form a syntactically ill-formed combination. Another example of a syntactic agreement is the agreement between the noun in the role of subject and the main verb, in number and person (*tú tiene).

Page 8: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since
Page 9: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

• Sometimes, rather simple operations can give helpful results by detecting some very frequent errors. The following two classes of errors specific for Spanish language can be mentioned here:

Page 10: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

Grammar• Absence of agreement between an article and the succeeding

noun, in number and gender, like in *la gatos. Such errors are easily detectable within a very narrow context, i.e., of two adjacent words. For this task, it is necessary to resort to the grammatical categories for Spanish words.

• · Omission of the written accent in such nouns as *articulo, *genero, *termino. Such errors cannot be detected by a usual spell checker taking the words out of context, since they convert one existing word to another existent one, namely, to a personal form of a verb. It is rather easy to define some properties of immediate contexts for nouns that never occur with the corresponding verbs, e.g., the presence of agreed articles, adjectives, or pronouns [38].

Page 11: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

STYLE CHECKING

• The stylistic errors are those violating the laws of use of correct words and word combinations in language, in general or in a given literary genre.

• This application is the nearest in its tasks to normative grammars and manuals on stylistics in the printed, oriented to humans, form. Thus, style checkers play a didactic and prescriptive role for authors of texts.

Page 12: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

• The style checker should use a dictionary of words supplied with their usage marks, synonyms, information on proper use of prepositions, compatibility with other words, etc. It should also use automatic parsing, which can detect improper syntactic constructions.

Page 13: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

REFERENCES TO WORDS AND WORD COMBINATIONS

• Synonyms, antonyms

Page 14: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

INFORMATION RETRIEVAL

• Information retrieval systems (IRS) are designed to search for relevant information in large documentary databases. This information can be of various kinds, with the queries ranging from “Find all the documents containing the word writing”. Accordingly, various systems use different methods of search.

Page 15: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

TOPICAL SUMMARIZATION

• In many cases, it is necessary to automatically determine what a given document is about. This information is used to classify the documents by their main topics, to deliver by Internet the documents on a specific subject to the users, to automatically index the documents in an IRS, to quickly orient people in a large set of documents, and for other purposes.

Page 16: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

TOPICAL SUMMARIZATION

Page 17: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

AUTOMATIC TRANSLATION

Page 18: Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since

Summary

• It has been shown that only very simple tasks like hyphenation or simple spell checking can be solved on a modest linguistic basis. All the other systems should employ relatively deep linguistic knowledge: dictionaries, morphologic and syntactic analyzers, and in some cases deep semantic knowledge and reasoning. What is more, nearly all of the discussed tasks, even spell checking, have to employ very deep analysis to be solved with an accuracy approaching 100%. It was also shown that most of the language processing tasks could be considered as special cases of the general task of language understanding, one of the ultimate goals of computational linguistics and artificial intelligence.