In OpenAI Gym Taxi-v2 there are totally 500 states. But that's too big considering 5*5 matrix.
I think I'm missing few things?
This task was introduced in [Dietterich2000] to illustrate some issues in hierarchical reinforcement learning. There are 4 locations (labeled by different letters) and your job is to pick up the passenger at one location and drop him off in another. You receive +20 points for a successful dropoff, and lose 1 point for every timestep it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions
From the paper Hierarchical Reinforcement Learning: Learning sub-goals and state-abstraction
In terms of state space there are 500 possible states: 25 squares, 5 locations for the passenger (counting the four starting locations and the taxi), and 4 destinations.
When working with enumerated states, then the count of classes in each dimension in which state can vary multiplies out to get the total state space "volume". It is valid for the taxi to be in any location on the grid (25), for the passenger to in any of 5 locations at that time (including in the taxi or in some location that they did not want to be dropped in), and for the passenger's destination to be any of the four special locations. These values are all independent and could occur in any combination. Hence 25 * 5 * 4 = 500 total states.
The terminal state is when the passenger location is the same as the destination location. Technically there are some unreachable states which show the passenger at the destination and the taxi somewhere else. But the state representation still includes those, because it is easier to do so.