Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397
Issue N9 2018 year
The problem of automatic extraction of facts from Russian texts was approached in this paper. The facts under examination were the intentions of social network users to purchase certain goods or use certain services. The utilized approach is based on the semantic tagging of user messages by an expert and the automatic construction of rules. A training set for expert annotation consisted of messages from the "VKontakte" social network, selected through the LeadScanner API. The invented system of semantic tags allowed distinguishing between various intentional blocks: objects, their different properties and emphatic constructions. Pre-processing of the training set included lemmatization and grammatical tagging with PyMorphy2. Then, on the material of the training set, a directed graph was constructed. Each node in this graph would correspond to an intentional block, including information about its expertly-assigned intentional tag, grammatical and/or lexical properties of its main word. The edges of the graph would connect the intentional blocks that could be found in adjacent positions across all the messages of the training set. Extraction of intention objects and their properties was achieved by test set analysis in accordance to the constructed graph. Test set included both messages containing non-consumer intentions or no intentions at all. The results of the testing stage show that the approach used allows ascertaining if a particular message expresses intention, and, if it does, extracting the intention object along with its relevant properties. The precision and recall of intention extraction was 81 % and 74 % respectively. The data extracted can be used for further refinement of message classification.