Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning

Seyeon Kim     Joonhun Lee     Namhoon Cho     Sungjun Han     Wooseop Hwang
AI Research
Qraft Technologies

Abstract

Conventional uncertainty-aware temporal difference (TD) learning often assumes a zero-mean Gaussian distribution for TD errors, leading to inaccurate error representations and compromised uncertainty estimation. We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning to enhance the flexibility of error distribution modeling by incorporating additional higher-order moment, particularly kurtosis, thereby improving the estimation and mitigation of data-dependent aleatoric uncertainty. We examine the influence of the shape parameter of the generalized Gaussian distribution (GGD) on aleatoric uncertainty and provide a closed-form expression that demonstrates an inverse relationship between uncertainty and the shape parameter. Additionally, we propose a theoretically grounded weighting scheme to address epistemic uncertainty by fully leveraging the GGD. We refine batch inverse variance weighting with bias reduction and kurtosis considerations, enhancing robustness. Experiments with policy gradient algorithms demonstrate significant performance gains.

BibTeX


          @misc{kim2024ggtde,
            title={Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning},
            author={Kim, Seyeon and Lee, Joonhun and Cho, Namhoon and Han, Sungjun and Baek, Seungeon},
            year={2024}
          }