Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N12 2025 year

DOI: 10.17587/prin.16.632-645
Development of an Information-Reference System for a Knowledge Base on Natural and Technogenic Safety of Siberian Regions
S. E. Popov, Senior Researcher, popov@ict.nsc.ru, V. P. Potapov, Chief Researcher, vadimptpv@gmail.com, V. V. Moskvichev, Chief Researcher, krasn@ict.nsc.ru, N. A. Chernyakova, Senior Researcher, fortuna@ict.nsc.ru, Federal Research Center for Information and Computational Technologies, Novosibirsk, 630090, Russian Federation
Corresponding author: Semion E. Popov, Senior Researcher, Federal Research Center for Information and Computational Technologies, Novosibirsk, 630090, Russian Federation, E-mail: popov@ict.nsc.ru
Received on June 17, 2025
Accepted on July 15, 2025

This article presents a modern approach to developing a question-answering system aimed at supporting decision making in territorial governance and enhancing environmental and industrial safety in Siberian regions. The proposed system leverages large language models (LLMs) enhanced with the Retrieval-Augmented Generation (RAG) framework. The knowledge base integrates heterogeneous data sources, including federal projects documentation, digital monitoring outputs, and geospatial datasets, enabling proactive risk assessment and regulatory compliance solu­tions for governmental and industrial stakeholders. To ensure efficient information retrieval and storage, the system utilizes a PostgreSQL database extended with pgvector for vector similarity searches. The LLM inference pipeline is deployed via the Text Generation Interface (TGI), with a backend built on FastAPI and a React-based frontend interface. Experimental evaluation demonstrates that incorporating re-ranking mechanisms based on token-level query-document matching significantly improves answer relevance and accuracy. The system generates responses in HTML format, allowing structured presentation of information enriched with hyperlinks to relevant figures, tables, and bibliographic references. When available, associated visual elements such as charts, maps, and tabular data are dynamically embedded alongside the generated text. This integration ensures that users can access supporting materials directly within the response interface without switching between different views or documents. Such capabilities enhance the interpretability and traceability of the provided information, particularly for domain experts who rely on contextual details and evidence-based insights. The intuitive graphical user interface further enhances accessibility for non-technical users, offering interactive navigation through complex datasets and analytical summaries. All components of the system have been designed with modularity and scalability in mind, enabling seamless adaptation to other subject domains. As a result, the solution demonstrates strong potential for broader application in public administration, emergency response planning, and risk management contexts beyond the scope of the initial implementation.

Keywords: large language model (LLM), contextual question-answering system, vector database, retrieval-augmented generation (RAG), text generation interface, FastAPI, react
pp. 632—645
For citation:
Popov S. E., Potapov V. P., Moskvichev V. V., Chernyakova N. A. Development of an Information-Reference System for a Knowledge Base on Natural and Technogenic Safety of Siberian Regions, Programmnaya Ingeneria, 2025, vol. 16, no. 12, pp. 632—645. DOI: 10.17587/prin.16.632-645. (in Russian).
The research was carried out within the state assignment of Ministry of Science and Higher Education of the Russian Federation for Federal Research Center for Information and Computational Technologies.
References:
  1. Jansen B. J., Rieh S. The Seventeen Theoretical Constructs of Information Searching and Information Retrieval, Journal of the American Society for Information Sciences and Technology, 2010, vol. 61, no. 8, pp. 1517—1534. https://doi.org/10.1002/asi.21358
  2. Wang H., Qin Z., Wan T. Text generation based on generative adversarial nets with latent variables, Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part II 22. Springer International Publishing, 2018. P. 92—103. DOI: 10.48550/arXiv.1712.00170.
  3. Roumeliotis K. I., Tselikas N. D. ChatGPT and Open-AI Models: A Preliminary Review, Future Internet, 2023, vol. 15, no. 6, article 192. DOI: 10.3390/fi15060192.
  4. Bezopasnost' Rossii. Pravovye, sotsial'no-ekonomicheskie I nauchno-tekhnicheskie aspekty. Tematicheskiy blok "Regional'nye problemy bezopasnosti" Razdel I. Monitoring, riski i bezopasnost' Sibirskogo federal'nogo okruga / Nauch. ruk. chl.-korr. RAN N. A. Makhutov, pod red. V. V. Moskvicheva. Novosibirsk, Znanie, 2023. 644 p. (in Russian).
  5. Bezopasnost' Rossii. Pravovye, sotsial'no-ekonomicheskie I nauchno-tekhnicheskie aspekty. Tematicheskiy blok "Regional'nye problemy bezopasnosti" Razdel II. Territorial'nye riski regionov Sibiri. Kuzbass, Eniseyskaya Sibir', Baykal / Nauch. ruk. chl.-korr. RAN N. A. Makhutov, pod red. V. V. Moskvicheva. Novosibirsk, Znanie, 2023. 624 p. (in Russian).
  6. Kukreja S., Kumar T., Bharate V. et al. Vector Databases and Vector Embeddings-Review, 2023 International Workshop on Artificial Intelligence and Image Processing (IWAIIP), Yogyakarta, Indonesia, 2023, pp. 231-236, DOI: 10.1109/IWAIIP58158.2023.10462847.
  7. Taipalus T. Vector database management systems: Fundamental concepts, use-cases, and current challenges, Cognitive Systems Research, 2024, vol. 85, pp. 1—8. DOI: 10.1016/j.cog-sys.2024.101216.
  8. Satyadhar J. Introduction to Vector Databases for Generative AI: Applications, Performance, Future Projections, and Cost Considerations Best, IARJSET, 2025, vol. 12, no. 2, pp. 79—93. DOI: 10.17148/IARJSET.2025.12210.
  9. Asai A. Yu., Xinyan K., Jungo H., Hannaneh H. One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval, 2021, arXiv:2107.11976. DOI: 10.48550/arXiv.2107.11976.
  10. Vidivelli S., Ramachandran M., Dharunbalaji A. Efficiency-Driven Custom Chatbot Development: Unleashing Lang-Chain, RAG, and Performance-Optimized LLM Fusion, Computers, Materials and Continua, 2024, vol. 80, no. 2, pp. 2423—2442. DOI: 10.32604/cmc.2024.054360.