# Solved – How state aggregation in reinforcement learning works

I am watching Prediction with linear approximation video course in the RL class by prof. Sutton. He presented state aggregation approach on a random walk problem. It seems that this approach just aggregates the value function of different states. However, I thought that this approach aggregates states and therefore, their value functions are aggregated, too. Can someone explain how this approach works? Is discretization of a continuous space the same as state aggregation? If yes, in discretization, states are aggregated at first and therefore, value functions are then aggregated.

Contents

It seems that this approach just aggregates the value function of different states. However, I thought that this approach aggregates states and therefore, their value functions are aggregated, too. Can someone explain how this approach works?

To the agent, there is no real difference between:

• Aggregating states by representing states $$s_1$$ to $$s_k$$ as a single entity $$x_1$$ then working with a value function $$v(x)$$

• Aggregating inside the value function so that $$v(s_1)$$ to $$v(s_k)$$ refer to the same singular table lookup inside $$v(s)$$.

This is an implementation detail which does not change the value of any calculation. The difference would only be noticed in terms of variable names in code or pseudo-code.

Importantly, in both cases, the original environment is not affected by aggregation, and still has whatever state representation fits the full description of the MDP.

Is discretization of a continuous space the same as state aggregation? If yes, in discretization, states are aggregated at first and therefore, value functions are then aggregated.

Yes, it is essentially the same thing conceptually. I would say that the value functions are not aggregated as such, but operate over the domain of the aggregated state as input. This is required in order for state aggregation to actually do anything – the aggregation is a view of the environment that is only for calculating value functions or policies.

Rate this post