So, I have seen many questions here asking whether it is a good idea to use AIC/BIC for determining the optimal number of hidden states for an HMM. What about the number of observable states though?
In my case, I am using a discrete HMM, where I quantised the continuous time-series observation signal to obtain a sequence of discrete emissions. I train a number of HMMs and then use the AIC to find the "best" one. Each HMM has a different number of hidden states (3 to 9) and a different number of values an observation can possibly be assigned after the quantisation (4 to 128).
When I use AIC, it barely makes a difference as for large number of states, the log-likelihood is too low (-2000) to compare to the punishment the AIC induces because of the free parameters. Also, the log-likelihood is always better (around -300) for low number of states.
Does after this make sense to use AIC, or should I just compare the different models (with different number of free parameters) only using the log-likelihood?
AIC is based on likelihood, and the likelihood has to be calculated for the same set of observations. If you have underlying data $x$ and later categorize it in two different ways to form $y=f(x)$ and $z=g(x)$ (both of which are based on the same $x$, but are measured on two different relatively coarse scales due to two different categorizations), then you cannot directly compare the likelihoods – nor AIC – of $y$ and $z$.
- Solved – How to infer the number of states in a Hidden Markov Model with Gaussian mixture emissions
- Solved – Predict observation using Hidden Markov Models
- Solved – Criteria for selecting the “best” model in a Hidden Markov Model
- Solved – Gesture recognition with HMM
- Solved – the number of free parameters for a directed acyclic graph