Paper
19 July 2024 Memory-aware parallelism setting algorithm in the spark platform
Bohan Li, Xin He, Junyang Yu, Hangyu Gu, Shunjie Pan
Author Affiliations +
Proceedings Volume 13181, Third International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024); 131816N (2024) https://doi.org/10.1117/12.3031168
Event: Third International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024), 2024, Beijing, China
Abstract
As one of the primary platforms for parallel computing, Spark plays a crucial role in enhancing the performance of largescale parallel processing of big data. In many Spark task scheduling processes, memory considerations are often overlooked, leaving the determination of the number of concurrent task threads to the users. Default parallelism parameters and those set by users for different algorithms or datasets may struggle to harness the maximum computational efficiency of the cluster. To address this issue, we first conduct a comprehensive analysis of the Spark job execution process, establish a job scheduling model, and propose a computation cost estimation for task execution stages. Subsequently, we analyze the impact of task parallelism on memory utilization and task execution efficiency, demonstrating its significant influence on various algorithms and datasets. Finally, we introduce the Memory-Aware Parallelism Setting Algorithm (MAPS), which is designed based on the model and controls task parallelism in real-time through memory sampling to achieve optimal computational efficiency. The MAPS algorithm iteratively executes across various stages of a job, optimizing scheduling strategies based on the computational environment to enhance performance. Experimental results indicate that the MAPS algorithm improves the job execution efficiency of the Spark framework, demonstrating good applicability across different types of jobs.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Bohan Li, Xin He, Junyang Yu, Hangyu Gu, and Shunjie Pan "Memory-aware parallelism setting algorithm in the spark platform", Proc. SPIE 13181, Third International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2024), 131816N (19 July 2024); https://doi.org/10.1117/12.3031168
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Distributed computing

Parallel computing

Control systems

Data processing

Analytical research

Computing systems

Data analysis

Back to Top