Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Issue N4 2017 year

DOI: 10.17587/prin.8.147-160
Using Container Virtualization in High Performance Computing with SLURM Scheduler
D. S. Nikolaev, dmitry.s.nikolaev@gmail.com, V. V. Korneev, korv@rdi-kvant, Scientific Research Institute Kvant, Moscow, 125438, Russian Federation
Corresponding author: Korneev Victor V., Principal Researcher, Moscow, 125438, Russian Federation E-mail: korv@rdi-kvant
Received on January 24, 2017
Accepted on February 06, 2017

The topic of this paper is using container virtualization technic to isolate parallel tasks running on diskless compute nodes of clustered heterogeneous HPC with SLURM job scheduler. The author proposes an approach of creating user defined virtual environments with certain computing resources requirements and suggests a heterogeneous HPC cluster model. A method of using modified CoreOS operation system with integrated SLURM scheduler is described. Diskless compute nodes boot CoreOS via PXE. Initial configuration of various OS-level items and mounts performs with Ignition and Cloud-Config declarations. Cluster shared storage is based on a NFS distributed file system protocol. Users home catalogs are shared across cluster nodes and each node has its own container storage on shared file system. Different isolated tasks share one compute node and use a combination of local Linux bridges and VXLAN to overlay container-to-container communications over physical network infrastructure. Using of privileged Docker containers allows user to load drivers via kernel modules to support different compute nodes hardware configurations. Isolated tasks can make use of InfiniBand communications or CUDA if appropriate devices assigned to their containers. There are some security concerns of using privileged containers; otherwise, container virtualization does not increase additional security risks. In conclusion, the author also provides performance measurements to illustrate how various tasks interfere with each other and further research topics.

Keywords: container virtualization, HPC, cluster, CoreOS, Docker, SLURM, MPI, InfiniBand, CUDA
pp. 147–160
For citation:
Nikolaev D. S., Korneev V. V. Using Container Virtualization in High Performance Computing with SLURM Scheduler, Programmnaya Ingeneria, 2017, vol. 8, no. 4, 147—160.