Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N4 2024 year

DOI: 10.17587/prin.15.206-215
Methods of Automatic Analysis of Information Presentation Dynamics in Texts Based on Adaptable Dictionaries of Scientific Terms
E. V. Vopilova, Postgraduate Student, vopilova.elena@gmail.com, E. N. Kryuchkova, Ph.D., Professor, kruchkova_elena@mail.ru, Polzunov Altai State Technical University, Barnaul, 656038, Russian Federation
Corresponding author: Elena N. Kryuchkova, Ph.D., Professor, Polzunov Altai State Technical University, Barnaul, 656038, Russian Federation, Email: kruchkova_elena@mail.ru
Received on January 17, 2024
Accepted on February 07, 2024

The paper proposes a method for determining the dynamics of aspectual content of scientific texts. The problems arising in the automatic processing of scientific texts are discussed, approaches to the creation of a combined method of aspect-oriented analysis of scientific texts are proposed. The results of experiments of aspect analysis of scientific publications in the field of mathematics are given. The algorithm of aspect analysis is built on the basis of processing both the semantic domain graph formed as a result of automatic extraction of information from the text of the mathematical encyclopedia and the weighted semantic graph of the publication. Weighting functions for calculating the dependence of links between professional terms of the semantic domain graph are proposed. During the processing the text of a scientific publication, the semantic domain graph is transformed into the semantic graph of the publication, the two-phase processing algorithm of which forms semantic dependencies between text fragments.

Keywords: aspect-oriented analysis, scientific vocabulary, semantic graph, classification of scientific text, automatic processing of unstructured texts
pp. 206–215
For citation:
Vopilova E. V., Kryuchkova E. N. Methods of Automatic Analysis of Information Presentation Dynamics in Texts Based on Adaptable Dictionaries of Scientific Terms, Programmnaya Ingeneria, 2024, vol. 15, no. 4, pp. 206-215. DOI: 10.17587/prin.15.206-215 (in Russian).
References:
    • Bruches E. P., Batura T. V. Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision, Vestnik NSU, Series: Information Technologies, 2021, vol. 19, no. 2, pp. 5-16. DOI: 10.25205/1818-7900-2021-19-2-5-16 (in Russian).
    • Morozov D. A., Glazkova A. V., Tyutyulnikov M. A., Iomdin B. L. Keyphrase Generation for Abstracts of the Russian-Language Scientific Articles, Vestnik NSU, Series: Linguistics and Intercultural Communication, 2023, vol. 21, no. 1, pp. 54-66. DOI:10.25205/1818-7935-2023-21-1-54-66 (in Russian).
    • Altmami N., Menai M. Automatic Summarization of Scientific Articles: A Survey, Journal of King Saud University -Computer and Information Sciences, 2020, vol. 34, pp. 1011-1028. DOI:10.1016/j.jksuci.2020.04.020.
    • Benites F. Information Retrieval and Knowledge Extraction for Academic Writing, Digital Writing Technologies in Higher Education, 2023, pp. 303-315. DOI: 10.1007/978-3-031-36033-6_19.
    • Choudhary K., Kelley M. ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data, 2022, available at: https://arxiv.org/abs/2209.08203 (date of access 15.01.2024).
    • Bommarito M., Katz D., Detterman E. M. LexNLP: Natural Language Processing and Information Extraction For Legal and Regulatory Texts, InfoSciRN: Legal Informatics (Topic), 2018, available at: https://api.semanticscholar.org/CorpusID:47018490 (date of access 15.01.2024).
    • Borovikova O. I., Kononenko I. S., Sidorova E. A. An approach to information extraction from clinical trials protocols on the basis of medical ontology, System Informatics, 2017, no. 9, pp. 93-110. DOI: 10.31144/si.2307-6410.2017.n9.p93-110 (in Russian).
    • Beliga S., Mestrovic A., Martincic-Ipsic S. An Overview of Graph-Based Keyword Extraction Methods and Approaches, Journal of Information and Organizational Sciences, 2015, vol. 39, pp. 1-20.
    • Lunev K. V. Graph Methods for Computing Semantic Similarity of a Pair of Keywords and Their Application to the Problem of Keywords Clustering, Programmnaya Ingeneria, 2018, vol. 9, no. 6, pp. 262-271. DOI: 10.17587/prin.9.262-271 (in Russian).
    • Dubinina E. Y. Automatic extraction of key lexical units of the scientific texts at the process of summarization, Proc. The SUAP Scientific Session, 2018, vol. 3, pp. 115-118 (in Russian).
    • Hossari M., Dev S., Kelleher J. D. TEST: A Terminology Extraction System for Technology Related Terms, Proc. The 2019 11th International Conference on Computer and Automation Engineering, 2019, pp. 78-81.
    • Danilov G., Ishankulov T., Kotik K. et al. The Classification of Short Scientific Texts Using Pretrained BERT Model, Public Health and Informatics, 2021, vol. 281, pp. 83-87. DOI: 10.3233/SHTI210125.
    • Dunn A., Dagdelen J., Walker N. et al. Structured information extraction from complex scientific text with fine-tuned large language models, 2022, available at: https://linkkk.org/10.48550/arX-iv.2212.05238 (date of access 15.01.2024).
    • Qasemi Zadeh B., Schumann A. The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods, Proc. The Tenth International Conference on Language Resources and Evaluation (LREC'16), 2016, pp. 1862-1868.
    • Rosario B., Hearst M. Classifying Semantic Relations in Bioscience Texts, Proc. The 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), 2004, pp. 430-473.
    • Stankovic R., Krstev C., Obradovic I. et al. Rule-based Automatic Multi-word Term Extraction and Lemmatization, Proc. Term Extraction and Lemmatization. The Tenth International Conference on Language Resources and Evaluation (LREC'16), 2016, pp. 507-514.
    • Hong Z., Tchoua R., Chard K., Foster I. SciNER: Extracting Named Entities from Scientific Literature, Proc. International Conference on Computational Science, 2020, vol. 12138, pp. 308-321. DOI: 10.1007/978-3-030-50417-5_23.
    • Bachishe O. I., Kryuchkova E. N., Shushakov D. S. Problems of automatic processing of scientific texts based on extraction of information from encyclopedias of relevant domain areas, Prog-tammnaya Ingeneria, 2023, vol. 14, no. 1, pp. 42-50. DOI: 10.17587/ prin.14.42-50 (in Russian).
    • Vinogradov M. Encyclopedia of Mathematics. The Soviet Encyclopedia, Moscow, 1977 (in Russian).
    • Korney A., Kryuchkova E., Savchenko V. Information Retrieval Approach Using Semiotic Models Based on Multi-layered Semantic Graphs, High-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production, 2020, vol. 1304, pp. 162-177 (in Russian).
    • Qaiser S., Ramsha A. Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents, International Journal of Computer Applications, 2018, vol. 181, no. 1, pp. 25-28. DOI: 10.5120 /ijca2018917395.
    • Vopilova E. V., Kryuchkova E. N. Characteristic functions for calculating the significance of terms in a semantic model of scientific knowledge representation, Proc. The 9th International Conference Knowledge - Ontologies - Theories (KONT-2023), 2023, pp. 49-53 (in Russian).
    • Dmitriev I. N. Fast algorithm of cluster analysis k-melinkkkds, Applied Discrete Mathematics, 2018, no. 39, pp. 116-127. DOI: 10.17223/20710410/39/11 (in Russian).
    • Zagrebina S. A. Ed. Bulletin of the South Ural State University, Series Mathematics. Mechanics. Physics, 2022, vol. 14, no. 2-4, available at: https://vestnik.susu.ru/mmph/ (date of access 15.01.2024) (in Russian).
    • Sbornik: Mathematics, scientific journal, 2019, available at: http://www.mathnet.ru/msb (date of access 15.01.2024) (in Russian).
    • Jiapeng W. Yihong D. Measurement of Text Similarity: A Survey, Information, 2020, vol. 11 (9). DOI: 10.3390/info11090421.
    • Pimenov I. S., Salomatina N. V. Building a model of time-varying content of thematic clusters in collections of scientific texts, Proceedings of the International Conference "APCAM", 2019, pp. 385-392. DOI: 10.24411/9999-016A-2019-10062 (in Russian).
    • Natasha. Tools for Russian NLP: segmentation, embeddings, morphology, lemmatization, syntax, NER, fact extraction, available at: https://github.com/natasha (date of access 15.01.2024).