'Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning'

‘Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning’

January 25, 2023

“Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner. … Many off-policy algorithms rely on this mechanism, along with differing protocols for cutting the IS ratios to combat the variance of the IS estimator. Unfortunately, once a trace has been fully cut, the effect cannot be reversed. … In this paper, we propose a multistep operator that can express both per-decision and trajectory-aware methods.”

Read the paper and see the full list of authors in ArXiv.

View on Site

Christopher Amato

Computer Science

‘Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning’

Related

NSF grant awarded for adaptive clothing

Patent for ‘lightweight pose estimation network’ goes to Fu

DARPA grant to enhance mixed reality security

Patents for experimental virtual reality methods

Patent for efficient computation

‘Human Mobility Is Well Described by Closed-Form Gravity-Like Models Learned Automatically from Data’

‘Foundations of Scalable Systems’

‘Network Coding for Engineers’

‘Practical Business Analytics Using R and Python’