Authors: Washim Uddin Mondal, Vaneet Aggarwal
Published on: October 18, 2023
Impact Score: 8.22
Arxiv code: Arxiv:2310.11677
Summary
- What is new: Improved sample complexity by a factor of logarithmic in optimality error for learning algorithms in Markov Decision Processes.
- Why this is important: Designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Processes.
- What the research proposes: Introduced Accelerated Natural Policy Gradient (ANPG) algorithm which uses an accelerated stochastic gradient descent to obtain the natural policy gradient without requiring the assumption that the variance of importance sampling weights is upper bounded.
- Results: ANPG achieved an $\mathcal{O}({\epsilon^{-2}})$ sample complexity and $\mathcal{O}(\epsilon^{-1})$ iteration complexity, surpassing previous state-of-the-art complexities.
Technical Details
Technological frameworks used: Accelerated stochastic gradient descent
Models used: Natural policy gradient
Data used: Not specified
Potential Impact
Gaming, robotics, autonomous vehicles, and any industry relying on complex decision-making processes.
Want to implement this idea in a business?
We have generated a startup concept here: OptiMDP.
Leave a Reply