In exercise 3.6 of the book, 'An Introduction to Reinforcement Learning' by Sutton, R. and Barto, A. They ask the following question at the very end of the chapter 3.5 (which introduces the Markov Property).
Broken Vision System: Imagine that you are a vision system. When you
are first turned on for the day, an image floods into your camera. You
can see lots of things, but not all things. You can't see objects that
are occluded, and of course you can't see objects that are behind you.i) After seeing that first scene, do you have access to the Markov state
of the environment?ii) Suppose your camera was broken that day and you
received no images at all, all day. Would you have access to the
Markov state then?
I don't quite understand what they are asking. What is the 'Markov state of the environment'?
What I've thought so far:
i) It doesn't have the Markov property, as what the state of the environment will be in the next image does not depend entirely on what is in the current image (although it may well be a good approximation). I don't quite know what that says about the Markov state of the environment though?
Or is the Markov state of the environment just all the information in the environment at that point in time? In which case it can't see occluded objects and objects not in field of view, so it doesn't have access to all that information (the state).
ii) I think its best I wait to get feedback on the first question before any kind of assumptions about answers to this second question.
Thanks, I'll be very grateful if you help me patch up my understanding of this 🙂
Best Answer
Here's the first, informal definition of Markov state given in that section (emphasis mine):
For example, a checkers position–the current configuration of all the pieces on the board–would serve as a Markov state because it summarizes everything important about the complete sequence of positions that led to it. Much of the information about the sequence is lost, but all that really matters for the future of the game is retained.
I won't duplicate the full formal definition, but it concludes thus:
In other words, a state signal has the Markov property, and is a Markov state, if and only if (3.5) is equal to (3.4) for all $s', r$ and histories $s_t, a_t, r_t dots r_1, s_0, a_0$.
Note that (3.4) and (3.5) are the conditional probabilities of the next state-reward pair conditioned on the entire history (3.4) and just the current state (3.5).
Here's another useful point:
It also follows that Markov states provide the best possible basis for choosing actions. That is, the best policy for choosing actions as a function of a Markov state is just as good as the best policy for choosing actions as a function of complete histories.
If forced to wager, I'd suggest that the intent of the first question was to get the reader to reason through this definition and argue one way or the other. Example 3.5, the pole balancing problem, provides a useful skeleton:
In the pole-balancing task introduced earlier, a state signal would be Markov if it specified exactly, or made it possible to reconstruct exactly, the position and velocity of the cart along the track, the angle between the cart and the pole, and the rate at which this angle is changing (the angular velocity). In an idealized cart-pole system, this information would be sufficient to exactly predict the future behavior of the cart and pole, given the actions taken by the controller.
All to say that I think you're on the right track with an answer of "No, here's why." In particular, a single image would give no indication of prior movement.
Similar Posts:
- Solved – the difference between Reinforcement Learning(RL) and Markov Decision Process(MDP)
- Solved – the difference between Reinforcement Learning(RL) and Markov Decision Process(MDP)
- Solved – Reinforcement Learning – difference between a Policy and a State transition matrix
- Solved – Appropriate distance measure between two finite state Markov chain models
- Solved – Markov Property in practical RL