Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N1 2024 year

DOI: 10.17587/prin.15.26-34
Search for Authors of Publications among Users of the Large Scientometric Data System Using the WAND Method
V. A. Vasenin, Professor, Head of Chair, vasenin@msu.ru, D. D. Zaslavskiy, Postgraduate Student, zabaf@yandex.ru, Lomonosov Moscow State University, Moscow, 119991, Russian Federation
Corresponding author: David D. Zaslavskiy, Postgraduate Student, Lomonosov Moscow State University, 119991, Moscow, Russian Federation, E-mail: zabaf@yandex.ru
Received on November 03, 2023
Accepted on November 17, 2023

Currently, the processing of search queries in big data systems is an important area of research. Its results find applications in various fields, including research, development and technological work (R&D). One of the main tasks in this area is accounting, analysis and promoting its participants through competitive means. To achieve this, information and analytical scientometric systems are developed to aggregate published R&D results. The article discusses a specific task arising in such systems, namely, the task of determining the involvement of authors in writing a scientific publication. Information and analytical systems store records of publications and their authors, but often there are no mechanisms that allow determining the relationship between the publication and the authors with high accuracy. The goal of the task, which is presented in the article, is to restore missing relationships. The algorithm presented in the article is based on the assumption that R&D work is carried out by teams of authors, and to determine the authors of the publication, it is enough to identify these teams. The materials of this article will be valuable to researchers and practitioners involved in automating processes within large information-analytical systems in the field of scientometrics and bibliometrics. Implementing the heuristic of authorship teams can significantly enhance the accuracy and performance of several similar-purpose systems, particularly those requiring real-time query processing.

Keywords: author search, publications, users, WAND search, correct author determination, scientometrics, big data systems
pp. 26–34
For citation:
Vasenin V. A., Zaslavskiy D. D. Search for Authors of Publications among Users of the Large Scientometric Data System Using the WAND Method, Programmnaya Ingeneria, 2024, vol. 15, no. 1, pp. 26—34. DOI: 10.17587/ prin.15.26-34. (in Russian).
References:
    • Vasenin V. A., Afonin S. A., Zenzinov A. A. et al. Mechanisms of the "ISTINA" System for Intelligent Analysis of the State and Stimulation of the Progress of Projects in the Field of Science and Higher Education, Scientific Services on the Internet: Proceedings of the 21st All-Russian Scientific Conference, Moscow, IAM named after M. V. Keldysh, 2019, pp. 210—221. DOI: 10.20948/ abrau-2019-48 (in Russian).
    • Sadovnichy V. A., Vasenin V. A. Intellectual System of Thematic Investigation of Scientometrical Data: Background of Creation and Methodology of Development, Part 1, Programmnaya Ingeneria, 2018, vol. 9, no. 2, pp. 51—58. DOI: 10.17587/prin.9.51-58 (in Russian).
    • Starikov P. P. Creation of a Unified Information and Analytical Resource of the Russian Federation in the Field of Scientific and Technical Information, Informazia i svyaz, 2017, no. 1, pp. 113—118 (in Russian).
    • Jiang K., Zhu L., Sun Q. Optimizing Scoring and Sorting Operations for Faster WAND Processing, Advanced Data Mining and Applications (ADMA) 2020. Lecture Notes in Computer Science, vol. 12447, Springer, 2020, pp. 499—514. DOI: 10.1007/978-3-030-65390-3_38.
    • Fontoura M., Josifovski V., Liu J. et al. Evaluation Strategies for Top-K Queries over Memory-Resident Inverted Indexes, Proc. VLDB Endow, 2011, vol. 4, no. 12, pp. 1213—1224. DOI: 10.14778/3402755.3402756.
    • Bar-Yehuda R., Even S. A Linear-Time Approximation Algorithm for the Weighted Vertex Cover Problem, Journal of Algorithms, vol. 2, no. 2, pp. 198—203. DOI: 10.1016/0196-6774(81)90020-1.
    • Gupta A., Lee E., Li J. A Local Search-Based Approach for Set Covering, Proceedings of 2023 Symposium on Simplicity in Algorithms (SOSA), Society for Industrial and Applied Mathematics, 2023, pp. 1—11. DOI: 10.48550/arXiv.2211.04444.
    • Yilmazel i. B., Arslan A. An Intrinsic Evaluation of the Waterloo Spam Rankings of the ClueWeb09 and ClueWeb12 Data-sets, Journal of Information Science, 2019, vol. 47, no. 1, pp. 41—57. DOI: 10.1177/0165551519866551.
    • Carmel D., Amitay E. TAAT versus DAAT in the Terabyte Track, Proceedings of 2006 Text REtrieval Conference (TREC), The MIT Press, 2006, available at: https://trec.nist.gov/pubs/trec15/papers/ibm-haifa.tera.final.pdf
    • Zenzinov A. A. Methods and Means of Data Verification on Publication Activity from External Scientometric Bases, Lomonosov Readings. Scientific Conference, April 20—26, 2021. Abstracts of Reports, Moscow University Press, 2021, pp. 97—98 (in Russian).