Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397
Issue N1 2015 year
In this study original procedures and techniques for automatized construction of multilingual subject dictionaries are proposed that rely on tools for automatic compiling and statistical processing of subject specific text collections as well as expert knowledge represented in thesauri and catalogue-like resources. Subject dictionaries are intended for use in the process of automatic categorization of short descriptions of user Internet activities (user queries). An experience of constructing Russian and English subject dictionaries is described and exemplified. Special questions outlined in the paper concern selection of domain texts that contain important subject specific terms, automatic lexicon extraction (including single- and multi-word terms), and lexicon extension on base of thesaural relations. Experimental results of dictionary-based query categorization tested across 108 domains for the two languages are discussed.