34016665 translation-and-paraphrasing

Post on 28-Nov-2014

891 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Translated and Paraphrased Plagiarism

The cat and mouse game continues...

One man’s rigor is another man’s mortis.

- CF Bohren and DR Huffman, 1983

Overview

The ever changing counter-detection landscape

Paraphrasing versus textual entailment

Ways to paraphrase

Tools of the trade

Many Roads to Plagiarism

The ‘old fashioned’ way

Many Roads to PlagiarismTranslated plagiarism.

Many Roads to PlagiarismTranslated plagiarism.

Many Roads to PlagiarismParaphrased plagiarism

Paraphrased plagiarism is not new either. However, there are new tools to aid in automatically paraphrasing text which are accelerating this form of detection avoidance.

Back-translation: the latest form of plagiarismMichael Jones University of Wollongong, Australia

4th Asia Pacific Conference on Educational Integrity (4APCEI) 28–30 September 2009

Paraphrase plagiat n'est pas nouveau non plus. Toutefois, il existe de nouveaux outils pour l'aide dans le texte paraphrase automatiquement qui sont l'accélération de cette forme d'évasion de détection.

Paraphrase plagiarism is not new either. However, there are new tools to help in paraphrasing the text automatically, which are accelerating this form of escape detection.

Many Roads to PlagiarismParaphrased plagiarism

Many Roads to PlagiarismParaphrased plagiarism

Paraphrasing vs Textual EntailmentTwo sentences are paraphrased if they “mean the same thing”:

1) Similarity: they share a substantial amount of information

2) Dissimilarities are extraneous: if extra information in the sentences exists, the effect of its removal is not significant.

Paraphrasing vs Textual Entailment

A paraphrase is a special case of textual entailment. A paraphrase is reflexive whereas textual entailment indicates that two sentences overlap to a degree with one sentence being subsumed by the other.

Ways to ParaphraseLexical substitution/synonymy

Hypo/Syno/Hyper-nym replacement: article, paper or red, crimson

Acronym replacement: Mr., mister

Contractions: do not, don’t

Compounding/decompounding: ballgame, ball game

Numeric/Alphabetic numbers: 11, eleven; 12/1/2010, December first two-thousand-ten

Ways to ParaphraseActive and passive exchange

The gangster killed 3 innocent people. vs Three innocent people are killed by the gangster.

Re-ordering of sentence components

Tuesday they met vs They met TuesdayZhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Ways to Paraphrase

Realization in different syntactic components

Palestinian leader Arafat vs Arafat, Palestinian leader

Prepositional phrase attachment

The Alabama plant vs A plant in Alabama

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Ways to ParaphraseChange into different sentence types

Who drew this picture? vs Tell me who drew this picture.

Morphological derivation

He is a good teacher. vs He teaches well. vs He is good at teaching.

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Ways to ParaphraseLight verb construction

The film impressed him. vs The film made an impression on him.

Comparatives vs. superlatives

He is smarter than everyone else. vs He is the smartest one.

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Ways to Paraphrase

Converse word substitution

John is Mary's husband. vs Mary is John's wife.

Verb nominalization

He wrote the book. vsHe was the author of the book.

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Ways to ParaphraseSubstitution using words with overlapping meanings

Bob excels at mathematics. vs Bob studies mathematics well.

Inference

He died of cancer. vs Cancer killed him.Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Ways to ParaphraseDifferent semantic role realization

He enjoyed the game. vs The game pleased him.

Subordinate clauses vs separate sentences lined by anaphoric pronouns.

The tree healed its wounds by growing new bark. vs The tree healed its wounds. It grew new bark.

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Tools of the TradeMicrosoft paraphrase corpus

Used to test algorithms

WordNet: English only :(

Synonyms, hypernyms, hyponyms, and antonyms.

Algorithms: Finite State Transducers (FSTs) and/or iterative Longest Common Sequence (LCS) on sets.

Tools of the Trade

Stemming or lemmatization

am, are, is be

car, cars, car's, cars' car

Word Alignment Examples

According to the MS paraphrase corpus:

This is a paraphrase

Not Paraphrased (However, the first sentence is textually entailed by the second. Turnitin would currently match this.)

12/14 = 86%12/16 = 75%

18/19 = 95%18/26 = 69%

Slippery SlopeWhen does textual entailment become arbitrary/noise?

14/48 = 29%14/34 = 41%

Slippery SlopeWhen does textual entailment become arbitrary/noise?

13/24 = 54%13/21 = 62%

Translated Plagiarism

Non-English markets, in particular, are concerned about their English as a second language students submitting English documents that have been translated to their native language.

Translated Plagiarism

Initial approach:

Non-English documents searched as they are now

Additional search performed: Translate document to English, search English documents, and then display English matches with translations (or vice versa)

Translated PlagiarismOur new strategic partner:

On demand SaaS statistical machine translation

Translated Plagiarism: Need for

Paraphrasing?Machines and humans translate text in many different ways.

Paraphrase detection allows us to match the variations.

Google translate: The zeitgeist is thinking and feeling one age. The term describes the characteristics of a particular period, or an attempt to remind us it. The German word Zeitgeist is transferred through English as a loanword into numerous other languages been.

Bing translate: Zeitgeist is thinking and feeling how an age. Is the nature of a particular era or trying to understand them. The German word Zeitgeist is taken from English as a loanword in many other languages.

http://de.wikipedia.org/wiki/Zeitgeist

Possible Turnitin UI

Finis

Thank you for listening!

Questions?

top related