Plagiarism in programming courses is a problem in academia, and with Internet popularization this problem grows. Various tools exist which perform plagiarism detection between student programming solutions. These tools give only suggestions of possible plagiarized cases and should not be used to accuse somebody of plagiarism. The indicated cases must be checked by teacher(s), who must make a final decision regarding the existence of plagiarism in each specific case. The detection accuracy of these tools is not perfect, there are false positives cases and some cases are never found. One approach for improving the accuracy is using source-code preprocessing techniques. In the existing literature this approach has not been much researched and the effect of preprocessing techniques is not known. That is why the aim of this research is to analyse the effect of source-code preprocessing on the plagiarism detection accuracy in student programming assignments with larger number of code lines. In the research qualitative and quantitative analysis will be performed using some open dataset and some real data form student assignments through multiple years.
Investigator: Matija Novak