DSD
Department of Distributed
Systems
MTA SZTAKI
KOPI
KOPI ProtectionInstead of Copy Protection
Máté Pataki
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Topics
n Plagiarismn KOPI Portaln How KOPI Worksn KOPI Protectionn Future Plans
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Problems
n Plagiarism is a huge problem at universitiesn There are too many theses even at one
university, no one can be familiar with all of them
n It is not enough to feel that something could be a plagiarism, some proof is needed
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Problems - Existing Systems
n Watermark or checksumn Authorship attributionn Open search enginesn Text comparisonn Questionnairen Systems with unknown algorithmsn No system for the Hungarian community
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
What we need
n Detects Partial Overlappingn Can not be automatically removedn Language independentn Can protect proprietary documentsn One to many comparisonn Without user interventionn Known algorithm
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Topics
n Plagiarismn KOPI Portaln How KOPI Worksn KOPI Protectionn Future Plans
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
The KOPI Project
n KOPI Online Plagiarism Search and Information Portal – Web based similarityand plagiarism search service
n Partner: Monash University, Melbournen Sponsored by the Hungarian Governmentn Developed 2003-2004n The Service is freely available to everybody
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
The Goal of KOPI
n Protect digital libraries from illegal copyingn Help teachers, professors, conference organizers to
easily find copied work, and the original sourcen Inform students and authors about plagiarism and
citations and the relevant (Hungarian) lawsn Increase the values of papers, theses by certifying
their genuineness
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Plagiarism Search Services
n Compare uploaded documents to each othern Find similar documents on the database of the
system:n Within the users own documentsn Documents uploaded by others
n Documents from the Internetn Digital libraries (MEK)n Universitiesn …
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Topics
n Plagiarismn KOPI Portaln How KOPI Worksn KOPI Protectionn Future Plans
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
How it works
text
chunk
fingerprint
DB
result
å Chunking
ç Compress (MD5)
é Upload to DB
è Query
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
The goal of the KOPI online Plagiarism Search and Information Portal is to protect documents against plagiarism.
the goal of the kopi online plagiarism search and informationportal is to protect documents …
• Original
• Word chunking (n=5)
Word chunking
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
The goal of the KOPI online Plagiarism Search and Information Portal is to protect documents against plagiarism.
the goal of the kopigoal of the kopi onlineof the kopi online plagiarismthe kopi online plagiarism searchkopi online plagiarism search portal…
• Original
• Overlapping word chunking (n=5)
Overlapping word chunking
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Hash based algorithmHash based algorithm ((MD5MD5))
MD5chunk fingerprint
Compressing fingerprints
n Input length is not limitedn Fastn The chance of two different texts to have the same
MD5 code is smalln Irreversiblen Can protect proprietary documents
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Topics
n Plagiarismn KOPI Portaln How KOPI Worksn KOPI Protectionn Future Plans
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Copy protection
n Prosn Harder to copy itn The way of the work can be followed (DRM)n More income for authors and sellers
n Consn Harder to use itn Can not totally prevent copyingn Sometimes for the legal use it must be circumventedn It is not always legal to usen Personal rights problems (DRM)n Hinders the spreading of the work
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Copy Protection forText Documents
n PDF, DOC… protectionn Can be easily and automatically circumvented
n Allow only online viewingn Strongly restricts the usen It is harder, but can be circumvented
n Narrow down the number of authorized usersn If once the documents is out of the system…
n Nothing protects against typing downn Close up into a drawer and leave it there
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
KOPI Protection
n Documents uploaded into the KOPI Systemn Plagiarism can be easily discoveredn The sources will also be knownn The risk to plagiarize will be too highn Circumventing it is time consuming and can not be
done automaticallyn The work can be freely distributed
n Must not deal with copy protectionn Search engines can index itn More people read itn More people cite from it
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Topics
n Plagiarismn KOPI Portaln How KOPI Worksn KOPI Protectionn Future Plans
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Future Plans
n Distributed Systemn Each university has an own system, butn Their are able to search in the others DBn Secure search with MD5 codes
n Upload databasesn Online and offline databasesn Documents found on the Internet
n Recognizing source codes and programming languages
n SOAP interface for integrated use of KOPI
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
KOPI Portal
http://kopi.sztaki.hu
DSD
DSDDepartment ofDistributed Systems
MTA SZTAKI
Web: http://dsd.sztaki.hu
Email: Mate.Pataki sztaki.hu
Thank you for your attention!
@