Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N2 2024 year

DOI: 10.17587/prin.15.97-104
Detecting Similarity of Program Code Using an Aggregated Approach to the Problem of Plagiarism Detection
D. I. Solomatin, Senior Lecture, solomatin.cs.vsu.ru@gmail.com, M. E. Novotochinov, Postgraduate Student, tyler532@yandex.ru, E. N. Desyatirikova, Professor, science2000@ya.ru, Voronezh State University, Voronezh, 394000, Russian Federation
Corresponding author: Elena N. Desyatirikova, Professor, Voronezh State University, 394000, Voronezh, Russian Federation, E-mail: science2000@ya.ru
Received on October 04, 2023
Accepted on December 05, 2023

In this work, the primary strategies employed for code plagiarism were explored, alongside an analysis of prevalent methods for detecting copied content. Based on the results of the analysis of various approaches, as well as the analysis of the subject area itself and on the basis of the formulated requirements, a new System for automatically checking software similarity for plagiarism was successfully designed, implemented and tested. When developing the System, an aggregated approach was used, which made it possible to use several basic similarity detection algorithms. Namely, the Greedy Row Tiling algorithm and the Sifting algorithm. Since the System is designed for programmers, in particular, for teachers, and also with the possibility of local launch, it is proposed to perform user interaction with the System in the form of a command line interface. The System is implemented in Python, which ensures that the suggested System is platform independent. Regular expressions are used to implement preprocessing and exclusion functions, and the libclang library is used for С++ code parsing and tokenization functions. Promising applications for the developed System include education and programming competitions. So universities and colleges can use the System to check code written by students to detect plagiarism. And in competitive environments such as hackathons or programming competitions, the System can be used to ensure fairness and prevent plagiarism among participants.

Keywords: system development, plagiarism, string algorithms, programming languages, expression parsing, code parsing, tokenization, dynamic programming, hashes, Python, С++
pp. 97–104
For citation:
Solomatin D. I., Novotochinov M. E., Desyatirikova E. N. Detecting Similarity of Program Code Using an Aggregated Approach to the Problem of Plagiarism Detection, Programmnaya Ingeneria, 2024, vol. 15, no. 2, pp. 97—104. DOI: 10.17587/prin.15.97-104. (in Russian).
References:
    • Kustanto C., Liem I. Automatic Source Code Plagiarism Detectio, SNPD Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, Daegu, Korea, 27—29 May, 2009, IEEE Computer Society, Washington, DC, USA, 2009, pp. 481—486. DOI: 10.1109/SNPD.2009.62.
    • Chowdhury H. A., Bhattacharyya D. K. Plagiarism: Taxonomy, Tools and Detection Techniques, 19th National Convention on Knowledge, Library and Information Networking, 2018, pp. 2—14.
    • Osadchaya A. O., Isaev I. V. Search of clones in program code, Scientific and Technical Bulletin of information technologies, mechanics and optics, 2020, vol. 20, no. 5, pp. 714—721. DOI: 10.17586/2226-1494-2020-20-5-714-721 (in Russian).
    • Zorkin A. Anti-plagiarism of the source code: a hybrid approach using the ANTLR parser, 2021, available at: https://habr. com/ru/articles/583882/ (in Russian).
    • Lvovich J. E., Solomatin D. I. Methods and algorithms for constructing a generator of lexico-syntactic analyzers for PEG grammars, The Bulletin of Voronezh State Technical University, 2008, vol. 4, no. 3, pp. 13—17 (in Russian).
    • Lutin V. I., Desyatirikova E. N., Makeeva O. B. Reliability of processing signals from object observation sensors in various physical fields, Theory and technology of radio communications, 2018, vol. 1, pp. 58—65 (in Russian).
    • Evtifeeva O. A., Krass A. L., Lakunin M. A. et al. Analysis of algorithms for searching for plagiarism in the source codes of programs, Scientific and Technical Bulletin of Information Technologies, Mechanics and Optics, 2007, no. 39, pp. 188—196 (in Russian).
    • Wise M. J. String similarity via greedy string tiling and running Karp-Rabin matching, 1993, pp. 1—16, available at: https:// www.researchgate.net/publication/262763983_String_Similarity_via_Greedy_String_Tiling_and_Running_Karp-Rabin_Matching
    • Prechelt L., Malpohl G., Philippsen M. Finding plagiarisms among a set of programs with jPlag. Technical Report 2000-1. 2002, pp. 10—16, available at: https://page.mi.fu-berlin.de/prechelt/Biblio/jplagTR.pdf