| 
 | ||||||||||
| 
 | 
 L. V. Savchenko, Ph. D., e-mail: lsavchenko@hse.ru, National Research University Higher School of Economics — N. Novgorod In this paper we consider a  problem of computer assisted language and pronunciation learning based on the  deep neural networks and the information theory of speech perception. At first,  a user learns the stable pronunciation of words. The best utterances from the  user with high posterior probability estimated by the pre-trained convolutional  neural network are added to the training set. Next, this training set is used  to fine-tune this convolutional neural network. If new utterances are  successfully recognized with the resulted neural network, it is concluded that  pronunciation of all words is distinguishable. In this case in order to additionally  verify the stability of pronunciation of each class (word), the closeness of  the user pronunciations is estimated by computing the average Kullback-Leibler  information discrimination between each signal and the centroid reference of  the class. If this mean discrimination for particular word is greater than a  certain threshold, then the training for this word should be repeated. The  experimental results for learning of English words proved that the proposed  approach is characterized by higher accuracy and speed for existing acoustic  models when compared to conventional techniques. P. 313–31 
 | |||||||||