Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N9 2024 year

DOI: 10.17587/prin.15.465-475
Using Large Language Models to Classify Some Vulnerabilities in Program Code
V. V. Shvyrov, Associate Professor, slshj@yandex.ru, D. A. Kapustin, Associate Professor, kap-kapchik@mail.ru, R. N. Sentyay, Senior Lecturer, sentyayroman@yandex.ru, T. I. Shulika, Assistant, shulika-tatyana@mail.ru, Lugansk State Pedagogical University, Lugansk, 91011, Russian Federation
Corresponding author: Denis A. Kapustin, Associate Professor, Lugansk State Pedagogical University, Lugansk, 91011, Russian Federation, E-mail: kap-kapchik@mail.ru
Received on June 20, 2024
Accepted on July 23, 2024

The paper studies the effectiveness of using large language models to detect common types of vulnerabilities in Python program code. In particular, using the technique of low-rank adaptation of (LoRA) models, fine-tuning of the CodeBERT-python model is performed. To train the models, we use the author's dataset, which consists of marked-up program code in Python. The trained models are used to detect and classify potential vulnerabilities. To evaluate the effectiveness of models, the number of false positives, false negatives, true positives and true negatives is determined. Also, accuracy, recall and F1-measures are calculated on a test data set for various configurations of model training macro parameters.

Keywords: language models, machine learning, static analysis, CodeBERT-python, LoRA, CWE, Transformer
pp. 465—475
For citation:
Shvyrov V. V., Kapustin D. A., Sentyay R. N., Shulia T. I. Using Large Language Models to Classify Some Vulnerabilities in Program Code, Programmnaya Ingeneria, 2024, vol. 15, no. 9, pp. 465—475. DOI: 10.17587/prin.15.465-475 (in Russian).
References:
  1. Cousot P., Cousot R. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints, Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, 1977, pp. 238—252. DOI: 10.1145/512950.512973.
  2. Allen F. Control flow analysis, ACM SIGPLAN Notices, 1970, vol. 5, issue 7, pp. 1—19. DOI: 10.1145/800028.808479.
  3. Shalaginov A., Banin S., Dehghantanha A., Franke K. Machine Learning Aided Static Malware Analysis: A Survey and Tutorial, Advances in Information Security, 2018, vol. 70, pp. 7—45. DOI: 10.1007/978-3-319-73951-9_2.
  4. Common Weakness Enumeration, available at: https://cwe. mitre.org/about/index.html (date of access 26.05.2024).
  5. Threat Data Bank, available at: https://bdu.fstec.ru/threat? ysclid=lwna64j8nq760938681 (date of access 26.05.2024).
  6. Rasheed Z., Sami M., Waseem M. et al. AI-powered Code Review with LLMs: Early Results, ArXiv, abs/2404.18496v1, 2024. DOI: 10.48550/arXiv.2404.18496.
  7. Fan A., Gokkaya B., Harman M. et al. Large language mod­els for software engineering: Survey and open problems, 2023 IEEE/ ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), Melbourne, Australia, 2023, pp. 31—53. DOI: 10.1109/ICSE-FoSE59343.2023.00008.
  8. Chen M., Tworek J., Jun H. et al. Evaluating large language models trained on code, ArXiv, abs/2107.03374, 2021.
  9. Li H.-Y., Shi S.-T., Thung F. et al. Deepreview: automatic code review using deep multi-instance learning, Advances in Knowl­edge Discovery and Data Mining. 23rd Pacific-Asia Conference, PAK-DD 2019, Macau, China, April 14—17, 2019, Proceedings, Part II 23, Springer, pp. 318—330. DOI: 10.1007/978-3-030-16145-3_25.
  10. Jeon S., Kim H. AutoVAS: An automated vulnerability analysis system with a deep learning approach, Computers & Security, 2021, vol. 106, article 102308. DOI: 10.1016/j.cose.2021.102308.
  11. Sabetta A., Bezzi M. A practical approach to the automatic classification of security-relevant commits, IEEE International conference on software maintenance and evolution (ICSME), IEEE, 2018, pp. 579—582. DOI: 10.1109/ICSME.2018.00058.
  12. Vaswani A., Shazeer N., Parmar N. et al. Attention is all you need, Advances in Neural Information Processing Systems, 2017, pp. 5998—6008.
  13. Hugging Face. Models, available at: https://huggingface.co/models (date of access 12.12.2023).
  14. Pu G., Jain A., Yin J., Kaplan R. Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs, ArXiv, abs/2304.14999. 2023. DOI: 10.48550/arXiv.2304.14999.
  15. Brian L., Rami A.-R., Noah C. The power of scale for parameter-efficient prompt tuning, ArXiv, abs/2104.08691, 2021. DOI: 10.18653/v1/2021.emnlp-main.243.
  16. Hu E., Shen Y., Wallis P. et al. LoRA: Low-Rank Adapta­tion of Large Language Models, ArXiv, abs/2106.09685, 2021.
  17. Zhao J., Wang T., Abid W. et al. LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report, ArXiv, abs/2405.00732, 2024. DOI: 10.48550/arXiv.2405.00732.
  18. Shazeer N., Mirhoseini A., Maziarz K. et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, ArXiv, abs/1701.06538, 2017.
  19. Wu X., Huang S., Wei F. Mixture of LoRA Experts, ArXiv, abs/2404.13628, 2024. DOI: 10.48550/arXiv.2404.13628.
  20. Zadouri T., Ustun A., Ahmadian A. et al. Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning, ArXiv, abs/2309.05444, 2023. DOI: 10.48550/arXiv.2309.05444.
  21. Liu Z., Wang H., Kang Y., Wang S. Mixture of Low-rank Experts for Transferable AI-Generated Image Detection, ArXiv, abs/2404.04883, 2024. DOI: 10.48550/arXiv.2404.04883.
  22. Huang C., Liu Q., Lin B. et al. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition, ArXiv, abs/2307.13269, 2023. DOI: 10.48550/arXiv.2307.13269.
  23. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol, 2019, pp. 4171—4186. DOI: 10.18653/v1/N19-1423.
  24. Kapustin D., Shvyrov V., Shulika T. Static Analysis of Corpus of Source Codes of Python Applications, Program Comput. Soft., 2023, vol. 49, pp. 302—309. DOI: 10.1134/S0361768823040072.
  25. Welcome to the Bandit documentation! Bandit documentation, available at: https://bandit.readthedocs.io/en/latest/ (date of access 26.05.2024).
  26. Secure, Reliable, and Intelligent Systems Lab / SRI Group Website, available at: https://www.sri.inf.ethz.ch / (date of access 26.05.2024).
  27. Shvyrov V., Kapustin D., Kushchenko A., Sentyay R. Large language models fine-tuning with the LoRA technique to solve problems of static analysis of program code, Bulletin of the Luhansk State Pedagogical University named V. Dahl, 2023, no 12 (78), pp. 210—215 (in Russian).
  28. Zhou S., Alon U., Agarwal S., Neubig G. CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code, Conference on Empirical Methods in Natural Language Processing, 2023. DOI: 10.48550/arXiv.2302.05527.
  29. Feng Z., Guo D., Tang D. et al. CodeBERT: A PreTrained Model for Programming and Natural Languages, ArXiv, abs/2002.08155, 2020. DOI: 10.18653/v1/2020.findings-emnlp.139.
  30. codeparrot/github-code Datasets at Hugging Face, available at: https://huggingface.co/datasets/codeparrot/github-code (date of access 26.05.2024).
  31. Transformers model to LoRA target module mapping, available at: https://github.com/huggingface/peft/blob/632997d1fb776c3cf05d8c2537ac9a98a7ce9435/src/peft/utils/other.py#L202 (date of access 26.05.2024).
  32. Stehman S. Selecting and interpreting measures of thematic classification accuracy, Remote Sensing of Environment, 1997, vol. 62, no. 1, pp. 77—89. DOI: 10.1016/S0034-4257(97)00083-7.