# Solved – X-mean algorithm BIC calculation question

I'm having trouble understanding some of the formulas in this paper related to BIC calculation (Dan Pelleg and Andrew Moore, X-means: Extending K-means with Efficient Estimation of the Number of Clusters).

First the variance equation:

• R – number of points
• K – number of clusters
• \$mu_i\$ – centroid associated with ith point.
• \$sigma^2 = frac{1}{R-K}sum_{i}(x_i – mu_{(i)})^2 \$

The log likelyhood then uses this sigma.
Am I reading this right, they're using 1 covariance matrix for all clusters (see quote below, they are)? This makes no sense. If you have 5 clusters, each one is a Gaussian according to k-means algorithm. So wouldn't it make sense to compute covariance \$sigma^2_i\$ for each cluster and use that?

My second question is regarding number of parameters to use in the BIC score. The paper mentions

Number of free parameters \$p_j\$ is simply the sum of K-1 class
probabilities, M*K centroid coordinates, and one variance estimate.

How do you get the K-1 class probabilities? I could do # of points in class i / total number of points. But then it's K-1, which probability is left out of the sum?

P.S. If anyone has a nicer paper on estimating k using similar methods I'd like to read that as well. At this point I'm not too concerned with speed.

Contents