main| new issue| archive| editorial board| for the authors| publishing house|
Ðóññêèé
Main page
New issue
Archive of articles
Editorial board
For the authors
Publishing house

 

 


ABSTRACTS OF ARTICLES OF THE JOURNAL "INFORMATION TECHNOLOGIES".
No. 11. Vol. 25. 2019

DOI: 10.17587/it.25.662-669

M. D. Ershov, Ph. D. Student, e-mail: ershov.m.d@rsreu.ru, Ryazan State Radio Engineering University, Ryazan, 390005, Russian Federation

First-Order Optimization Methods in Machine Learning

The problems arising in the training of multilayer neural networks of direct distribution due to the disadvantages of the gradient descent method are considered. A review of first-order optimization methods which are widely used in machine learning and less well-known methods is performed. The review includes a brief description of one of training methods for neural networks: the backpropagation method (also known as the backward propagation of errors). A separate section is devoted to the gradient descent optimization method and convergence problems arising from the use of the backpropagation method with gradient descent. The review considers the following first-order optimization methods with adaptive learning rate: gradient descent with momentum, Nesterov accelerated gradient method (NAG), AdaGrad, RMSprop, AdaDelta, Adam, AdaMax, Nadam, AMSGrad, ND-Adam, NosAdam, Padam, and Yogi. The features of each method and the problems of their use in practice are described. It can be noted that gradient descent, momentum and NAG are basis for the AdaGrad, Adam and other methods used in machine learning. In addition, the learning rate adjustment is performed for each parameter separately at each iteration of the neural network training. In later works the deterioration of convergence and generalization ability is described, which is associated with the use of the exponential moving average (a short-term memory of gradients). Such methods as AMSGrad, NosAdam, Padam are aimed at solving this problem and take advantage of both Adam and the stochastic gradient descent.
Keywords: neural networks, machine learning, deep learning, optimization, gradient descent, adaptive learning rate

P. 662–669 

Acknowledgements: The studies were carried out with financial support from the scholarship of the President of the Russian Federation to young scientists and graduate students (SP-2578.2018.5).

To the contents