Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N7 2023 year

DOI: 10.17587/prin.14.329-338
A Method for Improving the Caching Strategy for Computing Systems with Shared Memory
V. A. Egunov, Assistant Professor, vegunov@mail.ru, A. G. Kravets, Professor, allagkravets@yandex.ru, Volgograd State Technical University, Volgograd, 400005, Russian Federation
Corresponding author: Vitaly A. Egunov, Assistant Professor, Volgograd State Technical University, Volgograd, 400005, Russian Federation, E-mail: vegunov@mail.ru
Received on April 14, 2023
Accepted on May 11, 2023

This paper considers the problem of increasing the software efficiency in terms of reducing the costs of their development and operation in the process of solving production and research tasks. We have analysed the existing approaches to solving this problem by example of parameterized algorithms for implementing mVm (matrix—vector multiplication) and MMM (matrix—matrix multiplication)) BLAS (Basic Linear Algebra Subroutines) operations. To achieve the goal of increasing the software efficiency, we proposed a new design method, designed to improve data caching algorithms in the software development for computing systems with a hierarchical memory structure. Using the proposed design procedure, we developed an analytical approach to evaluating the software effectiveness from the point of view of using a memory subsystem with a hierarchical structure is implemented. We applied the proposed method to the two-sided Householder transformation for the task of reducing the general form matrix to the Hessenberg form. Then we presented new algorithms for solving the problem, which are optimized variants of the Householder classical transformation: Row-Oriented Householder and Single-Pass Householder. The use of these algorithms can significantly reduce the software execution time. Computational experiments were carried out on a parallel computing system with shared memory, which is one of the nodes of the computing cluster of the Volgograd State Technical University. We made a comparison of the software execution time that reduce general-form matrices to Hessenberg form, written using the proposed algorithms and using the LAPACKE_dgehrd() function of the Intel MKL library. The conclusions made in the work are confirmed by the results of the conducted computational experiments.

Keywords: program efficiency, memory hierarchy, cache memory, householder transformation, efficiency evaluation
pp. 329–338
For citation:
Egunov V. A., Kravets A. G. A Method for Improving the Caching Strategy for Computing Systems with Shared Memory, Programmnaya Ingeneria, 2023, vol. 14, no. 7, pp. 329—338. DOI: 10.17587/prin.14.329-338. (in Russian)
References:
  1. Nethercote N., Seward J. Valgrind. A Framework for Heavyweight Dynamic Binary Instrumentation, Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007). ACM, 2007, pp. 89-100. DOI:10.1145/1273442.1250746.
  2. Lawson C. L., Hanson R. J., Kincaid D. R., Krogh F. T. Basic Linear Algebra Subprograms for Fortran Usage, ACM Trans­actions on Mathematical Software, 1979, vol. 5, no. 3, pp. 308—323. DOI: 10.1145/355841.355847.
  3. Dongarra J. J., Du Croz J., Hammarling S., Hanson R. J. An Extended Set of FORTRAN Basic Linear Algebra Subprograms, ACM Transactions on Mathematical Software, 1988. Vol. 14, no. 1. P. 1—17. DOI: 10.1145/42288.42291.
  4. Dongarra J. J., Du Croz J., Hammarling S., Duff I. S. A Set of Level 3 Basic Linear Algebra Subprograms, ACM Trans­actions on Mathematical Software, 1990, vol. 16, no. 1, pp. 1—17. DOI: 10.1145/77626.79170.
  5. Golub G., Van Loan C. Matrix Computations, 3rd ed.; The Johns Hopkins University Press: Baltimore, MD, USA, 1996, 728 p.
  6. Xianyi Z., Qian W., Yunquan Z. Model-driven level 3 BLAS performance optimization on Loongson 3A processor, 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Singapore, December 17—19, 2012, Boston, Massachusetts, USA, IEEE Xplore Digital Library, 2012, pp. 684—691. DOI: 10.1109/ ICPADS.2012.97.
  7. Intel. Get Started with Intel® oneAPI Math Kernel Library, available at: https://www.intel.com/content/www/us/en/docs/onemkl/get-started-guide/2023-0/overview.html (date of access 14.05.2023).
  8. Goto K., Van De Geijn R. High-Performance Implementation of the Level-3 BLAS, ACM Transactions on Mathematical Software, 2008, vol. 35, no. 1, pp. 4:1—4:14. DOI: 10.1145/1377603.1377607.
  9. Goto K., Geijn R. A. Anatomy of High-Performance Matrix Multiplication, ACM Transactions on Mathematical Software, 2008, vol. 34, no. 3, pp. 12:1—12:25. DOI: 10.1145/1356052.1356053.
  10. Low T. M., Igual F. D., Smith T. M., Quintana-Orti E. S. Analytical Modeling Is Enough for High-Performance BLIS, ACM Transactions on Mathematical Software, 2016, vol. 43, no. 2, pp. 12:1—12:18. DOI: 10.1145/2925987.
  11. Levchenko N. N., Zmeev D. N. Dynamic control of the integrity of the computing process in a parallel streaming computing system, Informacionnye tekhnologii, 2021, vol. 27, no. 12, pp. 625—633. DOI: 10.17587/it.27.625-633 (in Russian).
  12. Bychkov I. V., Gorskij S. A., Feoktistov A. G., Kostromin R. O. Support for computing in distributed environments based on continuous integration, Informacionnye tekhnologii, 2021, vol. 27, no.12, pp. 619—625. DOI: 10.17587/it.27.619-625 (in Russian).
  13. Popov A. Yu. Principles of organization of a heterogeneous computing system with a set of discrete mathematics commands, Informacionnye tekhnologii, 2020, vol. 26, no. 2. pp. 67—79. DOI: 10.17587/it.26.67-79 (in Russian).
  14. Ortega J. M. Introduction to Parallel and Vector Solution of Linear Systems, Plenum Press, New York, NY, USA, 1988, 305 p.
  15. Demmel J. W. Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, USA, 1997, 430 p.
  16. Tomov S., Nath R., Dongarra J. Accelerating the reduction to upper hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Parallel Computing, 2010, vol. 36, pp. 645—654. DOI: 10.1016/j.parco.2010.06.001.
  17. Egunov V. A., Kravets A. G. Improving the efficiency of programs for computing systems with a hierarchical memory structure, Matematicheskie metody v tekhnologiyah i tekhnike, 2022, no. 4. pp. 100— 103. DOI: 10.52348/2712-8873_MMTT_2022_4_100 (in Russian).
  18. Kravets A. G., Egunov V. The Software Cache Optimization-Based Method for Decreasing Energy Consumption of Computational Clusters, Energies, 2022, 15, 7509. DOI: 10.3390/en15207509.