TY - JOUR
T1 - Optimization of reward shaping function based on genetic algorithm applied to a cross validated deep deterministic policy gradient in a powered landing guidance problem
AU - Nugroho, Larasmoyo
AU - Andiarti, Rika
AU - Akmeliawati, Rini
AU - Kutay, Ali Türker
AU - Larasati, Diva Kartika
AU - Wijaya, Sastra Kusuma
N1 - Funding Information:
This research was supported by the Degree-by-Research Doctoral Scholarship from LIPI Indonesia, with contract number SK Kepala LIPI No. 59/H/2020 to LN as recipient. The PUTI Q1 is granted from Universitas Indonesia with contract number: NKB-483/UN2.RST/HKP.05.00/2022 to SKW. The Research Grant of Rumah Program BRIN ORPA 2022 with contract number SK Kepala Organisasi Riset Penerbangan Dan Antariksa No. 15/III/HK/2022 is awarded to LN. Authors would like to thank you to Dr. Prawito Prajitno, Dr. Djati Handoko, Dr. Arief Sudarmadji, Dr. Adhi Harmoko Saputro for their valuable inputs in analysis and interpretation of data presented.
Funding Information:
This research was supported by the Degree-by-Research Doctoral Scholarship from LIPI Indonesia , with contract number SK Kepala LIPI No. 59/H/2020 to LN as recipient. The PUTI Q1 is granted from Universitas Indonesia with contract number: NKB-483/UN2.RST/HKP.05.00/2022 to SKW . The Research Grant of Rumah Program BRIN ORPA 2022 with contract number SK Kepala Organisasi Riset Penerbangan Dan Antariksa No. 15/III/HK/2022 is awarded to LN.
Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2023/4
Y1 - 2023/4
N2 - One major capability of a Deep Reinforcement Learning (DRL) agent to control a specific vehicle in an environment without any prior knowledge is decision-making based on a well-designed reward shaping function. An important but little-studied major factor that can alter significantly the training reward score and performance outcomes is the reward shaping function. To maximize the control efficacy of a DRL algorithm, an optimized reward shaping function and a solid hyperparameter combination are essential. In order to achieve optimal control during the powered descent guidance (PDG) landing phase of a reusable launch vehicle, the Deep Deterministic Policy Gradient (DDPG) algorithm is used in this paper to discover the best shape of the reward shaping function (RSF). Although DDPG is quite capable of managing complex environments and producing actions intended for continuous spaces, its state and action performance could still be improved. A reference DDPG agent with the original reward shaping function and a PID controller were placed side by side with the GA-DDPG agent using GA-optimized RSF. The best GA-DDPG individual can maximize overall rewards and minimize state errors with the help of the potential-based GA(PbGA) searched RSF, maintaining the highest fitness score among all individuals after has been cross-validated and retested extensively Monte-Carlo experimental results.
AB - One major capability of a Deep Reinforcement Learning (DRL) agent to control a specific vehicle in an environment without any prior knowledge is decision-making based on a well-designed reward shaping function. An important but little-studied major factor that can alter significantly the training reward score and performance outcomes is the reward shaping function. To maximize the control efficacy of a DRL algorithm, an optimized reward shaping function and a solid hyperparameter combination are essential. In order to achieve optimal control during the powered descent guidance (PDG) landing phase of a reusable launch vehicle, the Deep Deterministic Policy Gradient (DDPG) algorithm is used in this paper to discover the best shape of the reward shaping function (RSF). Although DDPG is quite capable of managing complex environments and producing actions intended for continuous spaces, its state and action performance could still be improved. A reference DDPG agent with the original reward shaping function and a PID controller were placed side by side with the GA-DDPG agent using GA-optimized RSF. The best GA-DDPG individual can maximize overall rewards and minimize state errors with the help of the potential-based GA(PbGA) searched RSF, maintaining the highest fitness score among all individuals after has been cross-validated and retested extensively Monte-Carlo experimental results.
KW - DDPG
KW - Fitness
KW - GA-search
KW - Reusable launch vehicle
KW - Reward shaping function
UR - http://www.scopus.com/inward/record.url?scp=85146696067&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2022.105798
DO - 10.1016/j.engappai.2022.105798
M3 - Article
AN - SCOPUS:85146696067
SN - 0952-1976
VL - 120
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 105798
ER -