Abstract:Consider the impact of current environmental perception state selection on subsequent behaviors, a reinforcement learning method for knowledge graph recommendation is proposed. This method constructs a reinforcement learning framework, and presents a dual reward driven strategy that combining short-term rewards with long-term incremental evaluations to improve the global reasoning ability of the actor-critic network; The method forms a self supervised path inference that using enhanced loss constraints as signal supervision strategy gradient updates. Experimental results show that, as an example of improving knowledge recommendation, our model can has higher recommendation accuracy and average reward compared baseline methods, and can guide strategies to find recommendation paths to provide effective explanations that providing decision support for path reasoning.