Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N1 2015 year

Experiment of Constructing Subject Dictionaries for Categorization of Short Descriptions of Web-Sites
I. S. Kononenko1, 2, Researcher, e-mail: irina_k@cn.ru, N. V. Salomatina1, 3, Researcher, e-mail: nataly@math.nsc.ru, E. A. Sidorova1, 2, Senior Researcher, e-mail: lena@iis.nsk.su
1 Novosibirsk State University,
2 A. P. Ershov Institute of Informatics Systems, SB RAS, Novosibirsk,
3 Sobolev Institute of Mathematics, SB RAS, Novosibirsk

In this study original procedures and techniques for automatized construction of multilingual subject dictionaries are proposed that rely on tools for automatic compiling and statistical processing of subject specific text collections as well as expert knowledge represented in thesauri and catalogue-like resources. Subject dictionaries are intended for use in the process of automatic categorization of short descriptions of user Internet activities (user queries). An experience of constructing Russian and English subject dictionaries is described and exemplified. Special questions outlined in the paper concern selection of domain texts that contain important subject specific terms, automatic lexicon extraction (including single- and multi-word terms), and lexicon extension on base of thesaural relations. Experimental results of dictionary-based query categorization tested across 108 domains for the two languages are discussed.

Keywords: subject dictionary, subject specific text collection, metasearch, thesaurus, catalogue, query categorization, Internet activity
pp. 41–48