Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N6 2023 year

DOI: 10.17587/prin.14.292-300
SMALT Software Package as a Tool for the Study of Graph-Theoretic Models of Texts
K. A. Kulakov, Associate Professor, kulakov@cs.petrsu.ru, N. D. Moskin, Associate Professor, moskin@petrsu.ru, A. A. Rogov, Head of Department, rogov@petrsu.ru, R. V. Voronov, Professor, rvoronov@petrsu.ru Petrozavodsk State University, Petrozavodsk, 185910, Russian Federation
Corresponding author: Nikolai D. Moskin, Associate Professor, Petrozavodsk State University, Petrozavodsk, 185910, Russian Federation, E-mail: moskin@petrsu.ru
Received on March 27, 2023
Accepted on April 25, 2023

The SMALT software package (Statistical methods of analysis of literary texts) is implemented to conduct research in the field of attribution of literary texts. The article discusses new tools for storing, visualizing, comparing and search­ing data implemented in the system. They are implemented for text analysis using graph-theoretic models. Examples of philological studies performed using these tools are given.

Keywords: SMALT software package, text attribution, graph-theoretic model, visualization, comparison, storage
pp. 292–300
For citation:
Kulakov K. A., Moskin N. D., Rogov A. A., Voronov R. V. SMALT Software Package as a Tool for the Study of Graph-Theoretic Models of Texts, Programmnaya ingeneria, 2023, vol. 14, no. 6, pp. 292—300. DOI: 10.17587/prin.14.292-300 (in Russian).
References:
  1. Markov A. A. An example of a statistical study on the text "Eugene Onegin", illustrating the connection of trials in a chain, Izvestiya Imperatorskoj Akademii Nauk, 1913, vol. 7, no. 3, pp. 153— 162 (in Russian).
  2. Morozov N. A. Linguistic spectra: a means to distinguish pla­giarisms from the true works of a given famous author. Stylometric study, Izvestiya otd. russkogo yazyka i slovestnosti Imp. Akad. nauk. Petrograd: tip. Imp. Akad. nauk, 1915, vol. XX, no. 4, pp. 93—134 (in Russian).
  3. Milov L. V., Borodkin L. I., Ivanova T. V. et al. Ot Nestora do Fonvizina: novye metody opredeleniya avtorstva / Ed. L. V. Milov. Moscow, Progress, 1994, 445 p. (in Russian).
  4. Shevelev O. G., Petrakov A. V. Text classification with deci­sion trees and feed-forward neural networks, Vestnik Tomskogo gosu-darstvennogo universiteta, 2006, no. 290, pp. 300—307 (in Russian).
  5. Velikanova N. P., Orekhov B. V. Digital textology: text attri­bution on the example of the novel by M. A. Sholokhov «Quiet Flows the Don», Mir Sholohova, 2019, vol. 1, no. 11, pp. 70—82 (in Russian).
  6. Masaeva O. S. Burrow's Delta method, Research work of students and young scientists: materials of the 74th All-Russian (with international participation) scientific conference of students and young scientists, Petrozavodsk, 2022, pp. 262—266 (in Russian).
  7. Marusenko M. A. Attribution of anonymous and pseudony­mous texts as a typical task of pattern recognition, Istoriografiya i istoch-nikovedenie otechestvennoj istorii, 2003, no. 3, pp. 116—135 (in Russian).
  8. Rogov A. A., Abramov R. V., Buchneva D. D. et al. The prob­lem of attribution in the magazines "Time", "Epoch" and the weekly "Citizen", Petrozavodsk, Islands Publ., 2021, 400 p. (in Russian).
  9. Buchneva D. D. Problems of attribution in Dostoyevsky's "Citizen": debate and arguments, Neizvestnyj Dostoevskij, 2022, vol. 9, no. 3, pp. 25—53 (in Russian).
  10. Buchneva D. D. Do modern statistical methods help in the attribution of texts? Materials of the IV International Scientific and Practical Conference "Formation of the professional competence of a philologist in a multicultural educational environment", Simferopol, 2021, pp. 47—53 (in Russian).
  11. Zaharov V. N. Problems of attribution of anonymous articles in Dostoevsky's publications, Neizvestnyj Dostoevskij, 2022, vol. 9, no. 3, pp. 5—24 (in Russian).
  12. Zaharova O. V. Attribution in the mirror of statistics: anonymous articles in the magazines of the Dostoevsky brothers "Time" and "Epoch", Neizvestnyj Dostoevskij, 2021, vol. 8, no. 2, pp. 81—106 (in Russian).
  13. Lebedev A. A. Analysis of sequences of parts of speech and the category of idiostyle, Uchenye zapiski Petrozavodskogo gosudarst-vennogo universiteta, 2021, vol. 43, no. 5, pp. 29—31. DOI: 10.15393/ uchz.art.2021.633 (in Russian).
  14. Rogov A. A., Kulakov K. A., Moskin N. D. Software support in solving the problem of text attribution, Programmnaya ingeneria, 2019, vol. 10, no. 5, pp. 234—240. DOI: 10.17587/prin.10.234-240 (in Russian).
  15. Moskin N. D. Algorithms for comparing graphs and graph-theoretic models, Petrozavodsk, PetrSU Publ., 2009, 84 p. (in Russian).
  16. Moskin N. D., Kulakov K. A., Rogov A. A., Abramov R. V. Research the Stability of Decision Trees Using Distances on Graphs, Trudy instituta sistemnogo analiza RAN, 2023, vol. 73, no. 1, рр. 94—100. DOI: 10.14357/20790279230111.
  17. Abramov R. V., Kulakov K. A., Lebedev A. A., Moskin N. D., Rogov A. A. Research of features of Dostoevsky's publicistic style by using n-grams based on the materials of the "Time" and "Epoch" magazines, Vestnik Sankt-Peterburgskogo universiteta. Prikladnaya matematika. Informatika. Processy upravleniya, 2021, vol. 17, no. 4, pp. 389—396. DOI: 10.21638/11701/spbu10.2021.407.
  18. Moskin N. D., Kulakov K. A., Rogov A. A., Abramov R. V. Characteristics of Lexical Spectra of Texts in the Problem of Establish­ing Authorship, Stability and Control Processes (SCP 2020): Proceedings of the 4th International Conference Dedicated to the Memory of Professor Vladimir Zubov, series Lecture Notes in Control and Information Sciences, Springer, 2022, pp. 761—768. DOI: 10.1007/978-3-030-87966-2_87.