No. 4. Vol. 30. 2024

DOI: 10.17587/it.30.190-197

P. A. Posokhov, Programmer, E. A. Rudaleva, Programmer, S. S. Skrylnikov, Programmer, O. V. Makhnytkina, PhD, Associate Professor, V. I. Kabarov, Senior Lecture,
ITMO University, Saint Petersburg, Russian Federation

Persona Knowledge Extraction from Dialog Data in Russian Language

The article deals with the joint application of linguistic rules and machine learning models to solve the problem of knowledge extraction from dialog data in Russian. Linguistic rules based on morphological, syntactic and grammatical features are used for automatic markup of the training dataset. The neural network model based on the T5 architecture was trained in multitasking mode, which implied solving the following tasks: a) answer generation based on the dialog history and the facts about the agent's persona found relevant to this history; b) extraction of facts about the persona using the generation method based on the last replica of the agent. The Toloka Persona Chat Rus dataset was used for the experiments. The metrics of both approaches show their applicability to the Russian language, for which no studies have been conducted before.
Keywords: personalized dialogue systems, knowledge extraction, persona knowledge, encoder-decoder models

Acknowledgements: This study was funded by a grant from the Russian Science Foundation ¹ 22-11-00128,

P. 190-197


