Paper
18 November 2024 Research on fusion scheduling based on Slurm and Kubernetes
Banghua Wu, Menglong Hu, Shuaibing Qin, Jinliang Jiang
Author Affiliations +
Proceedings Volume 13403, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024) ; 134031R (2024) https://doi.org/10.1117/12.3051639
Event: International Conference on Algorithms, High Performance Computing, and Artificial Intelligence, 2024, Zhengzhou, China
Abstract
Slurm is a resource management and job scheduling system widely used in the field of parallel computing. Kubernetes is an open source container orchestration platform widely used in cloud native and AI fields. With the development of technologies such as parallel computing, AI, and large-scale data processing, the demand for computing resources in business scenarios has become more complex and diverse. In order to better adapt to business needs and give full play to the advantages of both in different fields, this paper proposes a fusion scheduling solution based on Slurm and Kubernetes. The solution is mainly aimed at two scenarios: partitioned deployment and hybrid deployment. The fusion scheduling between the two is realized by developing the heterogeneous resource manager Unify. Dynamic node management function is provided for partitioned deployment, and unified resource view and unified scheduling function are provided for hybrid deployment. Application results show that the solution can effectively solve the problem of dynamic node division under partitioned deployment and the resource scheduling conflict problem of Slurm and Kubernetes under hybrid deployment. Through this solution, both can be applied to more complex demand scenarios and improve the overall resource utilization of the cluster.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Banghua Wu, Menglong Hu, Shuaibing Qin, and Jinliang Jiang "Research on fusion scheduling based on Slurm and Kubernetes", Proc. SPIE 13403, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024) , 134031R (18 November 2024); https://doi.org/10.1117/12.3051639
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Parallel computing

Artificial intelligence

Design

Computing systems

Education and training

Computer architecture

Scientific research

Back to Top