Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N7 2017 year

DOI: 10.17587/prin.8.328-336
Review of Methods for Classifying Text Documents Based on the Machine Learning Approach
E. I. Burlayeva, ekaterina0853@mail.ru, Donetsk National Technical University, Pokrovsk, Donetsk region, 85300, Ukraine
Corresponding author: Burlayeva Ekaterina I., Graduate Student, Donetsk National Technical University, Pokrovsk, Donetsk region, 85300, Ukraine, E-mail: ekaterina0853@mail.ru
Received on March 23, 2017
Accepted on May 04, 2017

Increased interest in creating effective tools for working with textual information based on automatic text processing is due to a sharp increase in textual information in electronic form leading to the need to automate various types of activities. Automatic text classifiers can be useful in almost any system where text documents are used to represent information. The use of classifiers makes it possible to reduce the labor costs for finding the necessary information represented by electronic texts. In order to understand the variety of documents, certain rules for their compilation, forms and methods of working with documents have been developed. One of the technologies for processing textual information is the automatic classification of text documents, which consists of assigning a document to one of several categories based on the content of the document. An important step in solving the task of classifying text is the choice of the machine learning method that will be applied to the vector representation of the document. This article presents a comparative analysis of various methods of machine learning, which are used for multi-class classification of text documents. Comparing and studying the four classification algorithms, namely, support vector method (SVM), latent-semantic analysis (LSA), naive Bayes and decision tree. There is a need to improve the quality and speed of the text classification, by combining the advantages of known text classification algorithms.

Keywords: automatic classification of documents, machine learning, support vector method (SVM), latent semantic analysis (LSA), decision trees, naive Bayesian classifier
pp. 328–336
For citation:
Burlayeva E. I. Review of Methods for Classifying Text Documents Based on the Machine Learning Approach, Programmnaya Ingeneria, 2017, vol. 8, no. 7, pp. 328—336.