Solved – BIC or AIC to determine the optimal number of clusters in a scale-free graph

I am currently trying to partition a scale-free ("big") graph (around 20k vertices, 500k edges) into appropriate sub-graphs. Having derived the Laplacian of the graph, I tried running an approach based on the spectral gap and Fiedler-vector, however, not really unexpectedly, ended up with vertex valuations (i.e. components of the corresponding eigenvector) being close to zero for a majority of the nodes. Clearly, there is no obvious cut in the graph.

Nevertheless, even if it's just for the sake of showing that several methods fail on graphs following the spectral features of the one I am working on, I would like to further explore spectral clustering approaches – some of which require a fixed k denoting the number of partitions.

I am aware of the use of the BIC and AIC with respect to k-means-clustering. What would interest me is, whether these criteria are also used in the realm of spectral graph clustering? Is there any rationale allowing to establish a link between the spectra of graphs and model selection criteria like the BIC and AIC?

Any input is much appreciated!


So, I have run a few tests. I have tried RSB with the median for the cutoff value c. I used high-evidence (low false positive rate, possibly high false negative rate) cluster-data to validate against (roughly ~250 non-overlapping groups), in a rather "poor man's" fashion, so nothing fancy at all. The initial cut already affected more than 235 clusters, even though many of them are actually rather small (we are talking about an avg. of around 75 here). I tried deviating from the median by the MAD (towards the valuation with the highest absolute value) which resulted in a bad performance as well. After some further tries, I ended up choosing the 1st- or 3rd-quartile of the valuation distribution, which allowed some small and rather trivial cuts. Nevertheless, the spectral gap never looked promising and the characteristic valuation simply horrible.

For computing them I used ARPACK (IRLM), so I expect the results to be considerably accurate in double precision. Here's a plot of the characteristic valuation (log2, just quick and dirty) after the first 2 iterations (which both yielded 2 clusters of roughly 36 nodes each) – the core seems to be too dense.


I thought about at least buying Fan Chung's more recent book on spectral clustering (spectral clustering), since I liked reading through the previous work (at least the first two chapters). They were dry to the bone, but nevertheless quite informative.

AIC and BIC are used for constraining the complexity of a range of models competing to explain the same data. Following on the heels of results in complexity theory and machine learning, there is reason to believe that in some scenarios, using these proxies for complexity gives useful info about the model that explains the data best while having the smallest generalization error.

In this case, you would first need to propose some model that produces the graph spectra you are seeing. You could postulate some different components that produce the graph you see, such as a mixture model.

For example, if you were working with the common telephone data set from Belgium (where there is a trivial cut to make between those who speak French and those who speak Dutch), you would need to postulate the mechanism (language preference correlated to region or something) that actually causes the two different segments of the graph.

Then, once you have a model, you would use AIC or BIC as a means for optimizing the parameter choices for fitting that model with your observed graph data. If the number of different causal clusters is something you include in the model, then it's something your optimization routine will spit back out under AIC or BIC constraints.

But it's not really the same thing as normalized cut, which does not exactly model anything about the graph. Normalized cut (and other spectral segmentation methods) proposes a cost function that may or may not be related to anything about your graph, and then makes cuts so as to minimize the cost.

If the cost functional used by the segmentation method does not meaningfully correspond to aspects of the data generating process that gives rise to your graph, then the segmentation performance can be arbitrarily bad. As you say, this is a reason why one-size-fits-all graph cutting procedures must be used delicately, with careful consideration of what the cost functional means for your data.

I have a few sources for this, and I'll come back and document it later this evening.

Similar Posts:

Rate this post

Leave a Comment