Reinforcement learning has exhibited notable empirical success in approximating solutions to the Hamilton-Jacobi-Bellman (HJB) equation, consequently generating highly dynamic controllers. However, the inability to bind the suboptimality of resulting controllers or the approximation quality of the true cost-to-go function due to finite sampling and function approximators has limited the broader application of such methods.
Consequently,…
