main| new issue| archive| editorial board| for the authors| publishing house|
Ðóññêèé
Main page
New issue
Archive of articles
Editorial board
For the authors
Publishing house

 

 


ABSTRACTS OF ARTICLES OF THE JOURNAL "INFORMATION TECHNOLOGIES".
No. 9. Vol. 30. 2024

DOI: 10.17587/it.30.467-473

D. M. Korobkin, Assistant Professor,
Volgograd State Technical University

Architecture of the Software for Updating Physical Knowledge Using a Combination of ClickHouse and HDFS

The analysis of the global patent database, journals on physical subjects, dissertations involves processing large amounts of text and graphic data, working with various sources and formats. Thus, the task of organizing effective accumulation, storage and access to large volumes of text and graphic data is urgent. The article describes the development of the concept and architecture of an automated system for updating physical knowledge for information support of search design using a centralized data warehouse and Apache Hadoop/Spark components. The concept and architecture of storing a unified knowledge base in the field of physics has been formed, using the associated storage of bibliographic information in the ClickHouse DBMS, large blocks of data (text fields of patents, journal articles, images) in the HDFS distributed file system. The approbation was carried out using the example of sampling based on US patent documents (USPTO). Verification of the correctness of the patent sample formation was carried out by comparing the results of clustering of textual patent information with the belonging of patents to IPC classes.
Keywords: patent, parsing, clustering, ClickHouse, HDFS

Acknowlegements: The study was supported by the Russian Science Foundation grant No. 23-21-00464, https://rscf.ru/project/23-21-00464/.

P. 467-473

References

  1. Altshuller G. S. Creativity as an exact science, Petroza­vodsk, Skandinavija, 2004 (in Russian).
  2. Mukhopadhyaya À . Ê . Function Analysis System Tech­nique (A Stimulating Tool), New Delhi, I K International Publishing House, 2012, 128 p.
  3. Efimov A. V. MPV analysis technique, available at: http:// www.metodolog.ru/01472/01472.html (access: 20.11.2023) (in Russian).
  4. Abramov O. Y. TRIZ-Based Cause and Effect Chains Analysis vs Root Cause Analysis, Proceedings of the 11th TRIZfest-2015 In­ternational Conference, September 10—12, 2015, Seoul, South Korea, pp. 283—291.
  5. Litvin S. S. New TRIZ-based tool—Function-oriented search (FOS), In ETRIA conference TRIZ future, 2004, pp. 505—509.
  6. Polovinkin A. I. Automation of search design, Moscow, Radio i svjaz, 1981, 344 p. (in Russian).
  7. Korobkin D. M., Fomenkov S. A., Zlobin A. R., Vereshhak G. A. Formation of metrics of innovation potential and prospects for the task of technological forecasting, Informacionnye Tehnologii, 2023, vol. 29, no. 4, pp. 215—223 (in Russian).
  8. Glazunov V. N. Technology of ideas: expert systems "NOVA-TOR" and "EDISON", available at: http://www.trizland.ru/trizba/ pdf-articles/system_novator.pdf (access: 20.11.2023) (in Russian).
  9. Zaripova V. M., Cyrulnikov E. S., Kiselev A. A. "Intellekt" for developing engineering creativity skills, Alma mater (Vestnik vysshej shkoly), 2012, no. 1, pp. 58—61 (in Russian).
  10. Bulk Data Storage System (BDSS) Version 2.0.0, USPTO, available at: https://bulkdata.uspto.gov (access: 20.11.2023).
  11. HDFS, Big Data School, available at: https://www.bigdataschool.ru/wiki/hdfs (access: 27.11.2023). (in Russian).
  12. ClickHouse Server Docker Image, dockerhub, available at: https://hub.docker.com/r/clickhouse/clickhouse-server/ (access: 24.11.2023).

 

To the contents