Authors: Yu-An Lin, Chen-Tao Lee, Guan-Ting Liu, Pu-Jen Cheng, Shao-Hua Sun
Published on: November 27, 2023
Impact Score: 8.3
Arxiv code: Arxiv:2311.15960
Summary
- What is new: The introduction of the Program Machine Policy (POMP), which combines programmatic RL and state machine policies to address the generalizability and interpretability issues in deep RL, specifically for long-horizon tasks.
- Why this is important: Deep reinforcement learning (deep RL) struggles with generalizability and interpretability, and while existing programmatic RL methods improve interpretability for short-horizon tasks, they cannot handle long-horizon tasks effectively.
- What the research proposes: POMP leverages a novel method to retrieve effective, diverse, and compatible programs which are then used as modes within a state machine, with a learned transition function to switch among these modes, enhancing the capability to handle complex, long-horizon tasks.
- Results: This approach outperforms both programmatic RL and deep RL baselines in various tasks, demonstrating enhanced generalization to longer horizons without fine-tuning. Ablation studies confirm the effectiveness of the search algorithm for program retrieval.
Technical Details
Technological frameworks used: Deep Reinforcement Learning
Models used: Program Machine Policy (POMP), Programmatic RL, State Machine Policies
Data used: nan
Potential Impact
Gaming, autonomous vehicles, robotics, and any sector relying on AI for complex decision-making tasks.
Want to implement this idea in a business?
We have generated a startup concept here: FlexAI.
Leave a Reply