Solved – Does episodic reinforcement learning still need a discount factor

The discount factor in reinforcement learning is used to determine how much an agent's decision should be influenced by rewards in the distant future, compared with rewards in the near future. My understanding is that there are two main reasons for this. First, is that with rewards in the distant future, there is greater uncertainty about whether those rewards will actually be acquired by the agent, since there are many steps between the current state and that future state. Second, is that the discount factor ensures that the Bellman equation will converge, since without the discount factor, there is a potentially infinite sum of positive numbers, and introducing the discount factor ensures that these numbers themselves become very small as this sum rolls out.

However, I am wondering if all of this changes when we are dealing with episodic reward learning, where the agent executes actions for a fixed number of steps, and then returns to its initial state. In this case, only the second reason seems to be valid, because there is no longer an infinite sum: the sum is bounded by the length of the episode.

So, in episodic reinforcement learning, should we still be using discount factor due to the first reason I stated above? Or can we ignore the discount factor entirely here?

Kris De Asis wrote that – The discount factor affects how much weight it gives to future rewards in the value function. A discount factor γ=0 will result in state/action values representing the immediate reward, while a higher discount factor γ=0.9 will result in the values representing the cumulative discounted future reward an agent expects to receive (behaving under a given policy). The convergence is influenced by the discount factor depending on whether it’s a continual task or an episodic one. In a continual one, γ must be between [0, 1), whereas an episodic one it can be between [0, 1].

Similar Posts:

Rate this post

Leave a Comment