In this popular question, high upvoted answer makes MLE and Baum Welch separate in HMM fitting.

For training problem we can use the following 3 algorithms: MLE (maximum likelihood estimation), Viterbi training(DO NOT confuse with Viterbi decoding), Baum Welch = forward-backward algorithm

BUT in Wikipedia, it says

The Baum–Welch algorithm uses the well known EM algorithm to find the maximum likelihood estimate of the parameters

**So, what's the relationship between MLE and Baum–Welch algorithm?**

My attempt: The objective for Baum–Welch algorithm is maximize likelihood, but it uses a specialized algorithm (EM) to solve the optimization. We still can maximize likelihood by using other methods such as gradient decent. This is why the answer make two algorithm separate.

Am I right and can anyone help me to clarify?

**Contents**hide

#### Best Answer

Refer to one of the answers (by Masterfool) from the question link you provided,

Morat's answer is false on one point: Baum-Welch is an Expectation-Maximization algorithm, used to train an HMM's parameters. It uses the forward-backward algorithm during each iteration. The forward-backward algorithm really is just a combination of the forward and backward algorithms: one forward pass, one backward pass.

And I agree with PierreE's answer here, Baum–Welch algorithm is used to solve maximum likelihood in HHM. If the states are known (supervised, labeled sequence), then other method maximizing MLE is used (maybe like, simply count the frequency of each emission and transition observed in the training data, see the slides provided by Franck Dernoncourt).

In the setting of MLE for HMM, I don't think you can just use gradient descent, since the likelihood (or, log-likelihood) doesn't have a closed-form solution and must be solved iteratively, same as the case in mixture models so then we turn to EM. (See more details in Bishop, Pattern Recognition book, chapter 13.2.1 Pg614)