I have the state – transition data with multiple sequences of unequal lengths, each sequence corresponding to one subject/person:

`head(prob_tbl2) period1 period2 period3 period4 period5 period6 period7 period8 1 <NA> active active active active active active active 2 active active active active active active active active 3 active active active active active active active active 4 <NA> <NA> active lapsed1 active lapsed1 active lapsed1 5 <NA> <NA> active lapsed1 lapsed2 lapsed3 lapsed4 lapsed5 6 <NA> active lapsed1 lapsed2 lapsed3 lapsed4 lapsed5 lapsed6 `

Based on this data, I created a probability matrix (as suggested here):

`trans.matrix <- function(X, prob=T) { tt <- table( c(X[,-ncol(X)]), c(X[,-1]) ) if(prob) tt <- tt / rowSums(tt) tt } prob_table= trans.matrix(as.matrix(prob_tbl2)) prob_table active lapsed1 lapsed2 lapsed3 lapsed4 lapsed5 lapsed6 lapsed7 active 0.56006735 0.43993265 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 lapsed1 0.19183552 0.00000000 0.80816448 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 lapsed2 0.10206773 0.00000000 0.00000000 0.89793227 0.00000000 0.00000000 0.00000000 0.00000000 lapsed3 0.06437215 0.00000000 0.00000000 0.00000000 0.93562785 0.00000000 0.00000000 0.00000000 lapsed4 0.04468764 0.00000000 0.00000000 0.00000000 0.00000000 0.95531236 0.00000000 0.00000000 lapsed5 0.04115773 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.95884227 0.00000000 lapsed6 0.02306733 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.97693267 `

What is the way I could validate those probabilities, e.g. receive stadard errors/confidence intervals and cross-validate them against a new/unseen dataset? Could I bootstrap my data or apply MLE to get the uncertainty levels, if so, how to do it in R?

I tried to use `library(markovchain)`

, with no much luck with my dataset, I described the problem here

Clearly, I'm new to Markov chains so any hints will be helpful!

#### Best Answer

A general approach is to look at likelihood of data given model parameters, i.e. $P(text{sequences}|text{matrix elements})$, or, for practical reasons, log-likelihood. For (not-hidden) Markov model, it is simply $log L = sum log m_{ij}$, where we sum over all transitions that have occurred.

As you see, it does not matter what is the number of sequences, or their lengths (they may be all different). Be warned that a single occurrence of a transition with the matrix element being 0 makes likelihood zero.

In general, you compare it against baselines of uniformly distributed symbols, and symbols distributed according to their observed distribution. See also: perplexity (for measuring performance of sequence prediction).

You can easily compute log-likelihood per transition for both train and validation data.

For a general reference, I recommend:

- LR Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, 1989

Measuring uncertainty of this model can be more complicated and depend on the assumptions (as always). Below I mention two general approaches, which can be tailored for Markov model.

### Approach 1: Gaussian approximation of likelihood

You need to be careful with the stochastic matrix conditions when differentiating log-likelihood.

See: MacKay's Information Theory, Inference, and Learning Algorithms Chapter IV Probabilities and Inference: Laplace's Method

### Approach 2: Bayesian model (with matrix elements being random variables)

Depends on prior (I would go with a flat prior for stochastic matrix), and takes a lot of computational power (Markov Chain Monte Carlo, not to be confused with Markov Chain you are using), but gives interpretable results (and no risk of zero estimates).

Again, MacKay is wonderful at explaining this approach. For the technical part I recommend PyMC3 in Python, especially with this tutorial:

I guess there are some tools in R, but I've never used them.

### Similar Posts:

- Solved – Markov chains that do not contain all the states in the model
- Solved – Markov chains that do not contain all the states in the model
- Solved – MLE for Markov Chains – intuitive explanation
- Solved – Implementing EM clustering for a mixture markov model
- Solved – Calculate the probability for a sequence generated by a graph