Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N9 2024 year

DOI: 10.17587/prin.15.465-475
Using Large Language Models to Classify Some Vulnerabilities in Program Code
V. V. Shvyrov, Associate Professor,, D. A. Kapustin, Associate Professor,, R. N. Sentyay, Senior Lecturer,, T. I. Shulika, Assistant,, Lugansk State Pedagogical University, Lugansk, 91011, Russian Federation
Corresponding author: Denis A. Kapustin, Associate Professor, Lugansk State Pedagogical University, Lugansk, 91011, Russian Federation, E-mail:
Received on June 20, 2024
Accepted on July 23, 2024

The paper studies the effectiveness of using large language models to detect common types of vulnerabilities in Python program code. In particular, using the technique of low-rank adaptation of (LoRA) models, fine-tuning of the CodeBERT-python model is performed. To train the models, we use the author's dataset, which consists of marked-up program code in Python. The trained models are used to detect and classify potential vulnerabilities. To evaluate the effectiveness of models, the number of false positives, false negatives, true positives and true negatives is determined. Also, accuracy, recall and F1-measures are calculated on a test data set for various configurations of model training macro parameters.

Keywords: language models, machine learning, static analysis, CodeBERT-python, LoRA, CWE, Transformer
pp. 465—475
For citation:
Shvyrov V. V., Kapustin D. A., Sentyay R. N., Shulia T. I. Using Large Language Models to Classify Some Vulnerabilities in Program Code, Programmnaya Ingeneria, 2024, vol. 15, no. 9, pp. 465—475. DOI: 10.17587/prin.15.465-475 (in Russian).
