Abstract:In this paper, a new inverse reinforcement learning algorithm is presented, which uses a full-order model to directly solve the optimal control problem of singular perturbation systems with two time scale characteristics. Compared with the traditional composite control method, which divides the original singular perturbation systems into fast time-scale systems and slow time-scale systems, the complexity of solving the problem is reduced. Firstly, a model-based strategy iterative inverse reinforcement learning algorithm is designed to reconstruct the unknown cost function using system dynamics and optimal control strategy gain. On this basis, the model-free off-policy inverse reinforcement learning algorithm is adopted, which only relies on the optimal behavior data displayed by the system and can accurately reconstruct the cost function without the prior knowledge of the system dynamics model and the gain of the optimal control strategy, so that the system can imitate the optimal behavior and realize unbiase