[This article belongs to Volume - 52, Issue - 05]
Gongcheng Kexue Yu Jishu/Advanced Engineering Science
Journal ID : AES-16-10-2021-45

Title : A Smart Optimization Method for Spark Jobs' Configuration Parameters
Ruan S, Pan F, Chen X, Luo Y, Wu T,

Abstract :

Apache Spark is a distributed open source framework for big data processing. The performance of Spark is greatly affected by parameter configuration settings. To get the best performance from Spark is still a big challenge because of a large number of parameters. This parameter is tuned manually by experimentation which is not effective. Besides, these parameters must be re-tuned for various applications. In this work, a method based on machine learning is proposed and developed to effectively self-tune Spark parameters. The results show that the performance is speeded up by 33.4% on an average, compared to the default configuration.