Efficient plagiarism detection for large code repositories

Burrows, S, Tahaghoghi, S and Zobel, J 2007, 'Efficient plagiarism detection for large code repositories', Software: Practice and Experience, vol. 37, pp. 151-175.


Document type: Journal Article
Collection: Journal Articles

Title Efficient plagiarism detection for large code repositories
Author(s) Burrows, S
Tahaghoghi, S
Zobel, J
Year 2007
Journal name Software: Practice and Experience
Volume number 37
Start page 151
End page 175
Total pages 25
Publisher John Wiley and Sons
Abstract Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections, this is not the case for code-based plagiarism detection. In this paper, we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment. Through detailed empirical evaluation on small and large collections of programs, we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems.
Keyword(s) Alignment
DOI - identifier 10.1002/spe.750
Copyright notice © 2006 John Wiley & Sons, Ltd.
ISSN 0038-0644
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 43 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 47 times in Scopus Article | Citations
Altmetric details:
Access Statistics: 142 Abstract Views  -  Detailed Statistics
Created: Fri, 07 Jan 2011, 09:11:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us