Abstract: To address more complex problems and enhance the effectiveness of reasoning paths, multi-hop reasoning methods utilizing reinforcement learning have gained significant attention recently.