main| new issue| archive| editorial board| for the authors| publishing house|
Ðóññêèé
Main page
New issue
Archive of articles
Editorial board
For the authors
Publishing house

 

 


ABSTRACTS OF ARTICLES OF THE JOURNAL "INFORMATION TECHNOLOGIES".
No. 9. Vol. 28. 2022

DOI: 10.17587/it.28.465-474

J. H. Mohammad, Graduate Student,
A. M. Mansour, Graduate Student, Y. A. Kravchenko, Associate Professor, V. V. Bova, Associate Professor,
Southern Federal University, Taganrog, 347922, Russian Federation

Key Phrases Extraction Method Based on a New Ranking Function

A new key phrases extraction method based on statistical approaches and contextual embedding models is proposed. The proposed method identifies the main candidate keywords for a document based on their frequencies and then weights
them based on their similarity values to the document. The frequency of each type of n-gram is calculated independently to ensure fair competition between them at the ranking stage. In addition, to assess the importance of a candidate keyword in a document, context-based embeddings are used to represent both the document and its candidate keywords in preparation for assessing their similarity. To prevent bias in similarity values in long keywords, a function has been introduced to normalize similarity scores of each candidate based on its length. The final score for the importance of a keyword is determined by a score function that returns the harmonic mean of the frequency and the similarity. The proposed method is tested using five different benchmark datasets against several state-of-art baselines including TF-IDF, TextRank, and Yake. The proposed method outperforms all baselines in terms of the MAP@K ranking metric.
Keywords: Automate Keyword Extraction, Key phrase, Keyword Ranking, TextRank, BERT, TF-IDF, Term Frequency

P. 465–474

To the contents