Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397
Issue N12 2015 year
The article describes key features of the natural language text syntactic parser, which is the fasters parser among syntactic parsers developed in Russia. Linguistic analyzers of Russian and English texts based on the described parser are used in electronic social network monitoring services for machine-based detection of positive and negative appraisals of target objects and extraction of facts on target objects and authors of messages.
The parser is based on the model of the syntactic structure of sentences in the form of syntactic constituent trees. The parser provides the application of formal grammar rules to the sequence of sentence's constituents and constructs all possible parsing variants — trees of constituents — with the choice of the best one, without repeating construction of occurring sub-trees. To ensure maximum speed performance of parsing, both the parsing algorithm and the natural language grammar rule description are written in C++.
To avoid grammar descriptions which are 'unreadable for a human' because of including procedural logic into the rules (conditional branching, cycles, access to members of a class, variable assignment) and to make it possible for a linguist unfamiliar with the programming to write new rules, a declarative meta-language, which works above the C++ language, and its interpreter have been developed. The described meta-language allows creating grammar rules by means of C++ declarative tools: Boolean expressions and specifically implemented procedures; while the necessary procedural component is implemented by the meta-language interpreter in the course of text parsing. To write non-standard grammar rules in exceptional cases there is a possibility to use all means of procedural programming language C++.
The description of the syntactic parser is given at the level of description of the basic algorithm of its work. The description of the meta-language is detailed, with examples of the Russian grammar rules implementation and software code of interpreter core in C++.