Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N9 2019 year

DOI: 10.17587/prin.10.391-399
Automatic Morphological Analysis for Russian: Application-Oriented Survey
I. V. Trofimov, itrofimov@gmail.com, Ailamazyan Program Systems Institute of RAS, Pereslavl-Zalessky, 152020, Russian Federation
Corresponding author: Trofimov Igor V., Senior Researcher, Ailamazyan Program Systems Institute of RAS, Pereslavl-Zalessky, 152020, Russian Federation E-mail: itrofimov@gmail.com
Received on July 22, 2019
Accepted on August 06, 2019

Researchers who focus on higher-level NLP tasks, and NLP application developers often rely on off-the-shelf solutions for lower-level subtasks like tokenization, sentence segmentation, lemmatizing, morphological tagging, and dependency parsing. The paper presents an accuracy evaluation of two morphological modules for the Russian language: the one used within the Sharoff&Nivres pipeline, and UDPipe. Their performance is compared against rnnmorph neural algorithm that showed the best results at the MorphoRuEval-2017 competition. For evaluation purposes we used its implementation as of May 2019. The study uses the datasets from MorphoRuEval and follows its evaluation framework. The experiments have revealed in which respects and to what extent rnnmorph outperforms the state-of-the-art pipeline solutions. Specifically, rnnmorph proves to be highly accurate (> 0.95) in identifying grammemes of nouns and pronouns, which is relevant for syntactic analysis of Russian. It is worth mentioning that rnnmorph was trained using five times less training data than TreeTagger, the morphological analyzer in the Sharoffs and Nivres pipeline. At the same time, rnnmorph is fairly slow, and the trained model at hand fails to generate a number of key morphological features. The comparative study data and supporting analyses presented in the paper will be of help for software designers challenged with the choice of a morphological analyzer to build into their applications.

Keywords: natural language processing, morphological analysis, MorphoRuEval, TreeTagger, UDPipe, rnnmorph, morphological tagsets, lemmatization, Russian language
pp. 391–399
For citation:
Trofimov I. V. Automatic Morphological Analysis for Russian: Application-Oriented Survey, Programmnaya Ingeneria, 2019, vol. 10, no. 9—10, pp. 391—399.
The reported study was funded by RFBR according to the research project No. 19-07-00779.