Solved – Is correlation between parameters a problem when fitting a Bayesian model using MCMC

Assuming some Bayesian model, for example:

$$y sim N(Xbeta, sigma)$$

where this model has:

Response vector:

$$ y = pmatrix{y_{1} \ y_{2} \ vdots \ y_{n}} $$

Predictor matrix:

x_{11} & x_{12} & ldots & x_{1p} \
x_{21} & x_{22} & ldots & x_{2p} \
vdots & vdots & ddots & vdots \
x_{n1} & x_{n2} & ldots & x_{np} \

Parameter set:

$$ beta = pmatrix{beta_{0} \ beta_{1} \ vdots \ beta_{p}}, text{ and } sigma $$

Prior distributions:

$$ p(beta_{i}|theta_{beta_{i}}), text{ for } i = 1,dots,p text{ and } p(sigma|theta_{sigma}) $$

where $theta_{alpha}$ denotes the set of hyperparameters corresponding to the probability distribution for parameter $alpha$.

Now, multicollinearity amongst the predictor variables in $X$ can lead to parameter estimate issues.

Also, correlation within the sample draws for a given parameter can lead to unreliable posterior distributions (i.e. $ p(beta_{i}^{t}|theta_{beta_{i}},beta_{i}^{t-1},ldots,beta_{i}^{1}) ne p(beta_{i}^{t}|theta_{beta_{i}}) $).

However, is correlation between parameters problematic (i.e. $ p(beta_{i}|theta_{beta_{i}},beta_{j}) ne p(beta_{i}|theta_{beta_{i}}), text{ } i ne j $)? By problematic, I mean both in terms of causing algorithmic problems and interpretability problems, insofar as the two can be separated.

It seems to me that, ultimately, correlation between parameters could be problematic in practice (particularly when the goal of the analysis is to focus on the marginalized distributions of a subset of all parameters) due to prolonged time required for the MCMC procedure to efficiently explore the support of all the parameters.
However, in theory, regardless of how long this may take, if the chain(s) mix(es) sufficiently well (which is a necessary condition for any MCMC Bayesian analysis, I guess), then the joint posterior distribution will be sufficient to extract marginalized distributions through integration of nuisance parameters.

In other words, the issue is an issue of runtime and the feasibility thereof. Without optimizations and speed-up tricks, correlation between parameters can lead to longer runtimes to achieve the true joint posterior distribution, but both algorithmic robustness and interpretability will be unaffected assuming a given MCMC routine is provided with enough time to sufficiently explore the parameter spaces.

TLDR: There will almost always be correlation in the parameters in the Bayesian world; even if you use independent priors. Correlation might affect mixing rates, but that is going to be on a case by case basis. In general, you want to ask, "how can I account for the correlations"? Multivariate estimators of the asymptotic covariance of your estimators let you do that.

There are two "correlations" between parameters that are present in this scenario (and in general in most MCMC scenarios)

  1. Correlation between parameters in the posterior. So the true covariance in the posterior is not a diagonal matrix.
  2. Lag correlation across parameters as a consequence of MCMC sampling. So $Covleft(beta_1^{(1)}, beta_2^{(1+k)}right)$, etc. (Look at a cross-correlation plot to see how significant these lag correlations are).

Just like in frequentist settings, correlation does not necessarily impact the point estimates for the posterior means, but it affects the quality of the point estimate. So if $mu = (beta_0, dots, beta_p, sigma)$ is the $p+2$ dimensional vector of interest, and you obtain $N$ MCMC samples ($mu_i$), the point estimate is $$mu_n = dfrac{1}{N}sum_{i=1}^{N} mu_i. $$

The two correlations mentioned above affect the quality of this estimator, since due to the Markov chain CLT, $$sqrt{n}(mu_n – mu) overset{d}{to} N_{p+2}(0, Sigma),, $$

where $Sigma$ is a $(p+2) times (p+2)$ covariance matrix. Interestingly, $Sigma$ breaks up nicely and explains exactly the two correlations mentioned above $$Sigma = underbrace{Var(mu_1)}_{text{Posterior covariance structure}} + underbrace{2 sum_{k=1}^{infty} Cov(mu_1,mu_{1+k})}_{text{Covariance due to correlated samples}}. $$

Just like in the usual MLE setup, if you can account for $Sigma$, you account for all the correlation in estimation process. Recently consistent estimators for $Sigma$ have been proposed, and you can now actually use these to say, "well this is the amount of error in the estimator, so do I have enough samples?". The R package mcmcse lets you estimate $Sigma$. You can also use functions called multiESS and minESS in it to find out how many effective samples you need, and what your effective sample size is. These calculations are done using estimates of $Sigma$, and thus account for the correlation.

This paper explains in detail.

Similar Posts:

Rate this post

Leave a Comment