Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397
Issue N1 2020 year
The paper provides an overview of the implementation of MPI parallelization tools, focused primarily on the field of high performance computing (HPC). Low-level communications software used in various MPI implementations (such as uDAPL, OFED, OFI, etc.), which significantly affect the achieved performance, is considered. For the most widely used MPI implementations in HPC (OpenMPI, MVAPICH2, Intel MPI), an analysis is made of the performance achieved in their modern versions using high-speed interconnects with remote direct access to RDMA memory (Ethernet 100 Gbit — RoCE / iWARP, Intel Omni-Path, Infiniband EDR) for communication of the most widely used computing nodes based on x86-64 processors. Among the characteristics of achieved performance, indicators of achieved throughput and message transmission delays are considered, including data in terms of achieved performance in widely used MPI microtests OMB (Ohio State University Micro Benchmarks) and IMB (Intel Micro Benchmark), taking into account its dependence from number of message and their sizes. These latency and throughput data also characterize the actual performance indicators of the interconnect hardware itself. More attention is paid in the review to one-way RMA communications, supported since MPI 2.0, which helps to increase productivity and support promising PGAS parallelization models. On the contrary, widespread performance tests using MPI applications, SPEC MPI 2007, that depend on a huge number of hardware and software parameters, not only related to interconnects and parallelization tools, are less relevant for HPC and are not analyzed in the paper.