Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397
Vol. 7, no 3 2016 year
In this paper matrix multiplication algorithm with double block data layout is suggested. This data layout method allows remarkably decrease amount of cache misses, TLB-cache misses and archive 97 % of peak performance. In the last section results of suggested algorithm with existing packages (MKL, PLASMA, OpenBLAS) comparison are reported. Author outlines that suggested algorithm supports only block matrices in contrast to MKL and OpenBLAS packages, which support matrices with standard data layout. As a consequence, suggested algorithm doesn't replace existing algorithms, but only supplements them.