Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397
Issue N4 2017 year
The topic of this paper is using container virtualization technic to isolate parallel tasks running on diskless compute nodes of clustered heterogeneous HPC with SLURM job scheduler. The author proposes an approach of creating user defined virtual environments with certain computing resources requirements and suggests a heterogeneous HPC cluster model. A method of using modified CoreOS operation system with integrated SLURM scheduler is described. Diskless compute nodes boot CoreOS via PXE. Initial configuration of various OS-level items and mounts performs with Ignition and Cloud-Config declarations. Cluster shared storage is based on a NFS distributed file system protocol. Users home catalogs are shared across cluster nodes and each node has its own container storage on shared file system. Different isolated tasks share one compute node and use a combination of local Linux bridges and VXLAN to overlay container-to-container communications over physical network infrastructure. Using of privileged Docker containers allows user to load drivers via kernel modules to support different compute nodes hardware configurations. Isolated tasks can make use of InfiniBand communications or CUDA if appropriate devices assigned to their containers. There are some security concerns of using privileged containers; otherwise, container virtualization does not increase additional security risks. In conclusion, the author also provides performance measurements to illustrate how various tasks interfere with each other and further research topics.