A novel reinforcement learning-based multi-operator differential evolution with cubic spline for the path planning problem

Reda, Mohamed orcid iconORCID: 0000-0002-6865-1315, Onsy, Ahmed orcid iconORCID: 0000-0003-0803-5374, Haikal, Amira Y. and Ghanbari, Ali orcid iconORCID: 0000-0003-1087-8426 (2025) A novel reinforcement learning-based multi-operator differential evolution with cubic spline for the path planning problem. Artificial Intelligence Review, 58 (5). ISSN 0269-2821

[thumbnail of VOR]
Preview
PDF (VOR) - Published Version
Available under License Creative Commons Attribution.

10MB

Official URL: https://doi.org/10.1007/s10462-025-11129-6

Abstract

Path planning in autonomous driving systems remains a critical challenge, requiring algorithms capable of generating safe, efficient, and reliable routes. Existing state-of-the-art methods, including graph-based and sampling-based approaches, often produce sharp, suboptimal paths and struggle in complex search spaces, while trajectory-based algorithms suffer from high computational costs. Recently, meta-heuristic optimization algorithms have shown effective performance but often lack learning ability due to their inherent randomness. This paper introduces a unified benchmarking framework, named Reda’s Path Planning Benchmark 2024 (RP2B-24), alongside two novel reinforcement learning (RL)-based path-planning algorithms: Q-Spline Multi-Operator Differential Evolution (QSMODE), utilizing Q-learning (Q-tables), and Deep Q-Spline Multi-Operator Differential Evolution (DQSMODE), based on Deep Q-networks (DQN). Both algorithms are integrated under a single framework and enhanced with cubic spline interpolation to improve path smoothness and adaptability. The proposed RP2B-24 library comprises 50 distinct benchmark problems, offering a comprehensive and generalizable testing ground for diverse path-planning algorithms. Unlike traditional approaches, RL in QSMODE/DQSMODE is not merely a parameter adjustment method but is fully utilized to generate paths based on the accumulated search experience to enhance path quality. QSMODE/DQSMODE introduces a unique self-training update mechanism for the Q-table and DQN based on candidate paths within the algorithm’s population, complemented by a secondary update method that increases population diversity through random action selection. An adaptive RL switching probability dynamically alternates between these Q-table update modes. DQSMODE and QSMODE demonstrated superior performance, outperforming 22 state-of-the-art algorithms, including the IMODEII. The algorithms ranked first and second in the Friedman test and SNE-SR ranking test, achieving scores of 99.2877 (DQSMODE) and 93.0463 (QSMODE), with statistically significant results in the Wilcoxon test. The practical applicability of the algorithm was validated on a ROS-based system using a four-wheel differential drive robot, which successfully followed the planned paths in two driving scenarios, demonstrating the algorithm’s feasibility and effectiveness for real-world scenarios. The source code for the proposed benchmark and algorithm is publicly available for further research and experimentation at: https://github.com/MohamedRedaMu/RP2B24-Benchmark and https://github.com/MohamedRedaMu/QSMODEAlgorithm.


Repository Staff Only: item control page