main| new issue| archive| editorial board| for the authors| publishing house|
Ðóññêèé
Main page
New issue
Archive of articles
Editorial board
For the authors
Publishing house

 

 


ABSTRACTS OF ARTICLES OF THE JOURNAL "INFORMATION TECHNOLOGIES".
No. 4. Vol. 30. 2024

DOI: 10.17587/it.30.190-197

P. A. Posokhov, Programmer, E. A. Rudaleva, Programmer, S. S. Skrylnikov, Programmer, O. V. Makhnytkina, PhD, Associate Professor, V. I. Kabarov, Senior Lecture,
ITMO University, Saint Petersburg, Russian Federation

Persona Knowledge Extraction from Dialog Data in Russian Language

The article deals with the joint application of linguistic rules and machine learning models to solve the problem of knowledge extraction from dialog data in Russian. Linguistic rules based on morphological, syntactic and grammatical features are used for automatic markup of the training dataset. The neural network model based on the T5 architecture was trained in multitasking mode, which implied solving the following tasks: a) answer generation based on the dialog history and the facts about the agent's persona found relevant to this history; b) extraction of facts about the persona using the generation method based on the last replica of the agent. The Toloka Persona Chat Rus dataset was used for the experiments. The metrics of both approaches show their applicability to the Russian language, for which no studies have been conducted before.
Keywords: personalized dialogue systems, knowledge extraction, persona knowledge, encoder-decoder models

Acknowledgements: This study was funded by a grant from the Russian Science Foundation ¹ 22-11-00128, https://www.rscf.ru/project/22-11-00128/

P. 190-197

References

  1. Zhong P., Sun Y., Liu Y., Zhang C., Wang H., Nie Z., Miao C. Endowing empathetic dialogue systems with personas, CoRR, 2020, vol. abs/2004.12316, available at: https://arxiv.org/abs/2004.12316.
  2. Mazare P., Humeau S., Raison M., Bordes A. Training millions of personalized dialogue agents, CoRR, 2018, vol. abs/1809.01984, available at: http://arxiv.org/abs/1809.01984.
  3. Waltl B., Bonczek G., Matthes F. Rule-based information extraction: Advantages, limitations, and perspectives, Jusletter IT, 2018.
  4. Oh M. et al. PK-ICR: Persona-Knowledge Interactive Context Retrieval for Grounded Dialogue, arXiv preprint arXiv: 2302.06674, 2023.
  5. 5. Tigunova A., Yates A., Mirza P., Weikum G. Listening between the lines: Learning personal attributes from conversations, CoRR, 2019, vol. abs/1904.10887, available at: http://arxiv.org/abs/1904.10887;
  6. Zheng Y., Chen G., Huang M., Liu S., Zhu X. Personalized dialogue generation with diversified traits, CoRR, 2019, vol. abs/1901.09672, available at: http://arxiv.org/abs/1901.09672.
  7. Zhang Y., Jin R., Zhou Z.-H. Understanding bag-of-words model: a statistical framework, International Journal of Machine Learning and Cybernetics, 2010, vol. 1, pp. 43—52.
  8. Gu J., Ling Z., Wu Y., Liu Q., Chen Z., Zhu X. Detecting speaker personas from conversational texts, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 1126—1136.
  9. Wu C., Madotto A., Lin Z., Xu P., Fung P. Getting To Know You: User Attribute Extraction from Dialogues, Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020, pp. 581—589.
  10. Devlin J., Chang M., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding, CoRR, arXiv:1810.04805, 2018.
  11. Ribeiro R., Carvalho J. P., Coheur L. PGTask: Introducing the Task of Profile Generation from Dialogues, CoRR, arXiv: 2304.06634, 2023.
  12. 12. Zheng Y., Zhang R., Mao X., Huang M. A pre-training based personalized dialogue generation model with persona-sparse data, CoRR, 2019, vol. abs/1911.04700, available at: http://arxiv.org/abs/1911.04700.
  13. 13. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I. Attention is all you need, CoRR, 2017, vol. abs/1706.03762, available at: http://arxiv.org/abs/1706.03762.
  14. Posokhov P., Apanasovich K., Matveeva A., Makhnyt-kina O., Matveev A. Personalizing dialogue agents for Russian: Retrieve and refine, 31st Conference of Open Innovations Associa­tion (FRUCT), 2022, pp. 245—252.
  15. Posokhov P., Matveeva A., Makhnytkina O., Matveev A., Matveev Y. Personalizing retrieval-based dialogue agents, 5peech and Computer: 24th International Conference, 5PECOM 2022, Gurugram, India, November 14—16, 2022, Proceedings. Berlin, Heidelberg: Springer-Verlag, 2022, pp. 554—566, available at: https://doi.org/10.1007/978-3-031-20980-247.
  16. Matveev Y., Makhnytkina O., Posokhov P., Matveev A., Skrylnikov S. Personalizing hybrid-based dialogue agents, Mathematics, 2022, vol. 10, no. 24, available at: https://www.mdpi.com/2227-7390/10/24/4657.
  17. Reimers N., Gurevych I. Sentencebert: Sentence embeddings using siamese bert-networks, CoRR, 2019, vol. abs/1908.10084, available at: http://arxiv.org/abs/1908.10084.
  18. Thulke D., Daheim N., Dugast C., Ney H. Efficient retrieval augmented generation from unstructured knowledge for task-oriented dialog, CoRR, 2021, vol. abs/2102.04643, available at: https://arxiv.org/abs/2102.04643.
  19. Raffel C., Shazeer N., Roberts A., Lee K., Narang S., Matena M., Zhou Y., Li W., Liu P. J. Exploring the limits of transfer learning with a unified text-to-text transformer, CoRR, 2019, vol. abs/1910.10683, available at: http://arxiv.org/abs/1910.10683.
  20. Xue L., Constant N., Roberts A., Kale M., Al-Rfou R., Siddhant A., Barua A., Raffel C. mt5: A massively multilin­gual pre-trained text-to-text transformer, CoRR, 2020, vol. abs/2010.11934, available at: https://arxiv.org/abs/2010.11934
  21. Izmailov P., Podoprikhin D., Garipov T., Vetrov D. P., Wilson A. G. Averaging weights leads to wider optima and better generalization, CoRR, 2018, vol. abs/1803.05407, available at: http://arxiv.org/abs/1803.05407.
  22. Cai Z., Ravichandran A., Maji S., Fowlkes C. C., Tu Z., Soatto S. Exponential moving average normalization for self-supervised and semi-supervised learning, CoRR, 2021, vol. abs/2101.08482, available at: https://arxiv.org/abs/2101.08482.
  23. Papineni K., Roukos S., Ward T., Zhu W.-J. Bleu: a method for automatic evaluation of machine translation, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Jul. 2002, pp. 311—318, available at: https://aclanthology.org/P02-1040.

To the contents