基于自适应状态聚集Q学习的移动机器人动态规划方法
作者:
作者单位:

(1.江苏大学 计算机科学与通信工程学院,江苏 镇江 212013; ;2.镇江高等专科学校 电子信息系,江苏 镇江 212000)

作者简介:

王 辉(1980),女,江苏丹阳人,讲师,硕士研究生,主要从事虚拟现实和人工智能方向的研究。

中图分类号:

TP393

基金项目:

江苏省高校自然科学研究计划(03kjd520075)。


A Dynamic Planning Method for Mobile Robot Based on Adaptive State Aggregating Q-Learning
Author:
Affiliation:

(1.School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang 212013, China ;2. Electron&Information Department,Zhenjiang College,Zhenjiang 212000,China)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [12]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    针对现有移动机器人路径规划方法存在的收敛速度慢和难以进行在线规划的问题,研究了一种基于状态聚集SOM网和带资格迹Q学习的移动机器人路径动态规划方法——SQ(λ);首先,设计了系统的总体闭环规划模型,将整个系统分为前端(状态聚集)和后端(路径规划);然后,在传统的SOM基础上增加输出层构建出三层的SOM网实现对移动机器人状态的聚集,并给出了三层SOM网的训练算法;最后,基于聚集的状态提出了一种基于带资格迹和探索因子自适应变化的改进Q学习算法实现最优策略的获取,并能根据改进Q学习算法的收敛速度自适应地控制前端SOM输出层神经元的增减,从而改进整体算法的收敛性能;仿真实验表明:文中设计的SQ(λ)能有效地实现移动机器人的路径规划,较其它算法相比,具有收敛速度快和寻优能力强的优点,具有较大的优越性。

    Abstract:

    Aiming at the given path planning method for mobile robot has the slow convergence rate and hard to plan on-line, a dynamic path planning method based on state aggregating SOM net and Q-Learning is researched. Firstly, the planning model of whole system is designed and it is divided into two parts such as frontier part (state aggregating) and back part (path planning), then the three-layer SOM net is developed to realize the aggregation of states based on the traditional SOM, the training algorithm for three-layer SOM net is given. Finally, a algorithm for obtaining the optimal strategy based on eligibility trace and adaptive changing explore factor is proposed, and the number of output nodes of SOM can be adaptive increase or decrease according to the convergence extent of the Q(λ), therefore, the whole convergence can be guaranteed by the improved algorithm. The simulation experiment shows the method designed can realize the path planning, and compared with the other methods, it has the rapid convergence rate and the ability to get the optimal solution, and it is proved to be has big priority over the other methods.

    参考文献
    [1] Schaal S, Atkeson C.Learning control in robotics.IEEE Robotics & Automation Magazine, 2010,7(3):2029.
    [2] Er M J, Zhou Y.A novel framework for automatic generation of fuzzy neural networks .Neurocomputing,2008,(71):584591.
    [3] 宋勇, 李贻斌, 李彩虹.移动机器人路径规划强化学习的初始化.控制理论与应用.2012,2(29):16231628.
    [4] 王金秋,孙晓松,秦华.基于强化学习的爬壁机器人路径规划方法.计算机测量与控制,2013,1(11):30933095.
    [5] Sutton R S, Barto A G.Reinforcement learning:an introduction .Cambridge:MIT Press, 1998.
    [6] Geist M, Pietquin O.Parametric value function approximation:A unified view .Proc.of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.NJ:IEEE, 2011:916.
    [7] Maei H R, Szepesvari C, Bhatnagar S, et al.Toward offpolicy learning control with function approximation .Proc of the 27th International Conference on Machine Learning.Haifa:Omnipress, 2010:719726.
    [8] 陈宗海, 文锋, 聂建斌, 等.基于节点生长k-均值聚类算法的强化学习方法 .计算机研究与发展, 2006,4(4):661666.
    [9] Chang H S, Fu M C, Hu J, et al.Simulation-based Algorithms for Markov Decision Processes .New York:Springer, 2007.
    [10] Cai Q, He H B, Man H.Imbalanced evolving self-organizing learning.Neuro-computing, 2014,3(10):258270.
    [11] Tadashi H, Akinori F, Osamu, et al.Fuzzy interpolation-based Q-learning with continuous states and actions .Proc.of the Fifth IEEE International Conference on Fuzzy Systems .New York, 2011:594600.
    [12] Sengupta N, Sen J, Sil J, et al.Designing of on line intrusion detection system using rough set theory and Q-learning algorithm .Neurocomputing, 2013,1(2):161
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王辉,宋昌统.基于自适应状态聚集Q学习的移动机器人动态规划方法计算机测量与控制[J].,2014,22(10):3419-3422.

复制
分享
文章指标
  • 点击次数:1514
  • 下载次数: 80
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 在线发布日期: 2015-01-15
文章二维码