boutique big data: understanding 19th-century reprint culture with plagiarism detection software

Post on 15-Apr-2017

234 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BOUTIQUE BIG DATAUnderstanding 19th-Century Reprint Culture With Plagiarism Detection SoftwareM. H. Beals (ORCID: 0000-0002-2907-3313)Loughborough University@MHBEALS

THE HISTORICAL PROBLEM• Culture of Reprinting in 18th and 19th Centuries

• Inconsistent Attribution

• Inconsistent Survival of Network Components

• Limited Historiographical Resources

Image Courtesy of Mike Licht (CC BY) at https://www.flickr.com/photos/notionscapital/2313507405

SEARCH AND TRANSCRIBE

Left Image Courtesy of Dan Tantrum (CC BY NC ND) at https://www.flickr.com/photos/tantrum_dan/2344581860

Meta Data

Page Text

COPYFIND REPRINT DETECTION• Freeware Programme Developed by Lou Bloomfield

http://plagiarism.bloomfieldmedia.com/z-wordpress/software/copyfind/

• Highly Customisable Search As Well as Open Source

• Measures Left, Right and Overall Matches

• Displays Left-Right Comparisons of Text

• Extremely Effective at Discovering OCR-Transcribed Matches

Image Courtesy of the Lou Bloomfield at http://rabi.phys.virginia.edu/lab3e/

COPYFIND REPRINT DETECTION• Freeware Programme Developed by Lou Bloomfield

http://plagiarism.bloomfieldmedia.com/z-wordpress/software/copyfind/

• Highly Customisable Search As Well as Open Source

• Measures Left, Right and Overall Matches

• Displays Left-Right Comparisons of Text

• Extremely Effective at Discovering OCR-Transcribed Matches

Image Courtesy of the Lou Bloomfield at http://rabi.phys.virginia.edu/lab3e/

ESTABLISHING LIKELY CANDIDATES

• Single Year (1810-1819) Contained over 200,000 Possible Matches

• Removed Internal (Same Title) Reprints

• Restricted Match Size (90 Right, 90 Left or 160 Overall)

• Restricted Date Separation (200 Days)

1810

1811

1812

1813

1814

1815

1816

1817

DIRECTIONALITY• Reprint Maps are Non-Linear,

Similar to Phytogenic Trees

• Paths of Specific Branches Dictated by Date, Content, Errors

• Similar Method to Meme-Tracking (Adamic et al, 2014)

• Attributions Are Often Red Herrings

1818-1819

“WITHIN THIS COLLECTION”

WWW.SCISSORSANDPASTE.NET

WWW.GITHUB.COM/MHBEALS/SCISSORSANDPASTE

THANK YOUM. H. Beals (ORCID: 0000-0002-2907-3313)Loughborough University@MHBEALS

top related