Journal "Software Engineering" (Programmnaya Ingeneria) | Algorithm for Extracting Key Concepts from Educational Programs of IT Specialists using the Hybrid Context Ranking Method

Main

New Issue

Archive

Most cited articles

Editor in chief

Editorial board

For the authors

Publishing ethic

Peer reviewing

Publishing House

Old site

Russian

Issue N7 2025 year

DOI: 10.17587/prin.16.334-346

Algorithm for Extracting Key Concepts from Educational Programs of IT Specialists using the Hybrid Context Ranking Method

R. A. Fayzrakhmanov, Professor, fayzrakhmanov@gmail.com, E. V. Dolgova, Professor, shagrata@mail.ru, I. I. Sukhikh, Postgraduate Student, vargostelemax@gmail.com, Perm National Research Polytechnic University, Perm, 614990, Russian Federation

Corresponding author: Ilya I. Sukhikh, Postgraduate Student, Perm National Research Polytechnic University, Perm, 614990, Russian Federation E-mail: vargostelemax@gmail.com

Received on March 17, 2025

Accepted on April 21, 2025

In the context of rapid digital technology development and the growing volume of educational materials, ensuring interdisciplinary consistency in academic courses has become a critically important task for higher education institutions. The annual increase in digital data within educational systems underscores the need to develop effective methods for processing and analyzing curricula. Interdisciplinary consistency in an educational program involves creating coherent and logically sequential content across different courses, including identifying common themes, eliminating redundancy, and ensuring the correct sequence of material (e.g., introducing basic concepts before more complex ones). This consistency enhances the quality of IT specialist training, ensuring their competitiveness in the labor market, where professionals with comprehensive knowledge and skills are in high demand. A key step toward achieving this goal is automating the process of extracting key concepts from course syllabi. Key concepts are high-level ideas reflecting the core content of an academic discipline, which can be used to analyze connections between courses within an educational program. However, existing key concept extraction methods have significant limitations. For example, statistical approaches like TF-IDF, based on word frequency, cannot differentiate between different meanings of a polysemous term or account for semantic relationships between concepts. Graph-based methods like PageRank focus on structural relationships between words but often ignore their contextual meaning. These shortcomings are particularly evident when analyzing educational texts, where the same concept may be interpreted differently depending on the discipline or section of the syllabus. To overcome these limitations, a hybrid method called ContextualRank is proposed, combining semantic and contextual similarity analysis using pre-trained language models such as BERT and T5. This approach considers not only frequency characteristics and structural relationships but also the contextual usage of key concepts, making it more effective for analyzing course syllabi. The method uses a graph model with edge weights calculated through a combination of cosine similarity of vector representations and contextual metrics, along with the TextRank algorithm for concept ranking. To assess the relevance of key concepts to course objectives, a mechanism for analyzing direct and transitive connections was implemented using the Floyd—Warshall algorithm. Experiments conducted on data from Perm National Research Polytechnic University demonstrated that ContextualRank outperforms TF-IDF by 21 % in F-measure, achieving a precision of 0.7 and a recall of 0.93. Results were visualized as graphs highlighting key concepts and their connections to course objectives. The study demonstrates the potential of the method for automating the analysis of curricula, improving their structure, and adapting to dynamic labor market requirements.

Keywords: interdisciplinary consistency, key concept extraction, hybrid methods, language model, TextRank, TF-IDF, PageRank, ContextualRank, educational programs, IT specialists, semantic similarity, contextual analysis

pp. 334—346

For citation:
Fayzrakhmanov R. A., Dolgova E. V., Sukhikh I. I. Algorithm for Extracting Key Concepts from Educational Programs of IT Specialists using the Hybrid Context Ranking Method, Programmnaya Ingeneria, 2025, vol. 16, no. 7, pp. 334—346. DOI: 10.17587/prin.16.334-346 (in Russian).

References:

Kravchenko D. Yu., Kravchenko Yu. A., Mansour Yu. et al. Algorithm for optimization of keyword extraction based on the application of a linguistic parser, Artificial intelligence, knowledge and data engineering, 2024, vol. 23, no. 2, pp. 467—494. DOI: 10.15622/ia.23.2.6 (in Russian).
Fayzrakhmanov R. A., Dolgova E. V., Sukhikh I. I. Harmonization Status Analysis of 09.03.01 "Computer Science and Computer Engineering" Discipline, Bulletin of Perm University. Mathematics. Mechanics. Computer Science, 2024, no. 1 (64), pp. 53—59. DOI: 10.17072/1993-0550-2024-1-53-59 (in Russian).
Xiang L. Application of an Improved TF-IDF Method in Literary Text Classification, Advances in Multimedia, 2022, vol. 2022, no. 1, pp. 1—10. DOI: 10.1155/2022/9285324.
Jalilifard A., Carida V. F., Mansano A. F. el al. Semantic Sensitive TF-IDF to Determine Word Relevance in Documents, Advances in Computing and Network Communications. Lecture Notes in Electrical Engineering, 2021, vol. 736, pp. 327—337. DOI: 10.1007/978-981-33-6987-0_27.
Adyanata L., Prasiwiningrum E. Implementation of PageRank Algorithm for Visualization and Weighting of Keyword Networks in Scientific Papers, Journal of Applied Data Sciences, 2023, vol. 4, no. 4, pp. 382—391. DOI: 10.47738/jads.v4i4.138.
Bilal M., Almazroi A. Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews, Electron Commer Res, 2023, vol. 23, pp. 2737—2757. DOI: 10.1007/s10660-022-09560-w.
Wang M., Pan X., Yao D., Xiaohui H. T5-Based Model for Abstractive Summarization: A Semi-Supervised Learning Approach with Consistency Loss Functions, Applied Sciences, 2023, vol. 13, no. 12, pp. 1—16. DOI: 10.3390/app13127111.
Mohammad J. Keyphrase extraction based on large language models, Izvestiya SFedU. Engineering Sciences, 2024, no. 5 (241), pp. 143—151. DOI: 10.18522/2311-3103-2024-5-143-151 (in Russian).
Yarullin D. V. Intelligent control system for IT specialists training based on denotative analytics, Applied Mathematics and Control Sciences, 2022, no. 3, pp. 141—164. DOI: 10.15593/24999873/2022.03.08 (in Russian).
Batura T. V., Bakiyeva A. M. A hybrid method for automatic summarization of scientific and technical texts based on rhetorical analysis, Software & Systems, 2020, vol. 33, no. 1, pp. 144—153. DOI: 10.15827/0236-235X.129.144-153 (in Russian).
Bennani-Smires K., Musat C., Hossmann A. et al. Simple Unsupervised Keyphrase Extraction using Sentence, Embeddings, Proceedings of the 22nd Conference on Computational Natural Language Learning, 2018, pp. 221—229. DOI: 10.18653/v1/K18-1022.
Sun Y., Qiu H., Zheng Y., Wang Z., Zhang C. Sifrank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model, IEEE Access, 2020, vol. 8, pp. 10896—10906. DOI: 10.1109/ACCESS.2020.2965087.
Pellegrini V. Self-Supervised Fine-Tuning of sentence embedding models using a Smooth Inverse Frequency model: Automatic creation of labels with Smooth Inverse Frequency model (Dissertation). Stockholm: KTH Royal Institute of Technology; 2023. 69 p., available at: https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-339783 (date of access 10.02.2025).
Mars M. From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough, Applied Sciences, 2022, vol. 12, no. 17, pp. 1—19. DOI: 10.3390/app12178805.
Du N., Thudumu S., Giardina A. et al. Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph model, Journal of Big Data, 2023, vol. 10, pp. 1—19. DOI: 10.1186/s40537-023-00833-1.
Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of NAACL-HLT, 2019, pp. 4171—4186. DOI: 10.18653/v1/N19-1423.
Jugran S., Kumar A., Tyagi B. S., Anand V. Extractive Automatic Text Summarization using SpaCy in Python & NLP, International Conference on Advance Computing and Innovative Technologies in Engineering, 2021, pp. 582—585. DOI: 10.1109/ICA-CITE51222.2021.9404712.