This question was asked in physics stack exchange but didn't get an answer and it was suggested this would be a better place. Two years later I am wondering the same thing. Here is the question with slightly different wording:
How can a linear Gaussian conditional probability distribution be represented in canonical form?
For example, let $mathbf{X}$ and $mathbf{Y}$ be two sets of continuous variables, with $|mathbf{X}| = n$ and $|mathbf{Y}| = m$. Let
$p(mathbf{Y} | mathbf{X}) = mathcal{N}(mathbf{Y} | mathbf{a} + Bmathbf{X}; C)$
where $mathbf{a}$ is a vector of dimension $m$, $B$ is an $m$ by $n$ matrix, and $C$ is an $m$ by $m$ matrix.
How does one represent that in canonical form?
This is boggling me particularly since a linear Gaussian is not necessarily a Gaussian probability distribution.
The canonical representation of a Gaussian has
$K = Sigma^{-1}$ and $mathbf{h} = Sigma^{-1} boldsymbol{mu}$.
How can one have a $K$ and $mathbf{h}$ for a something that is not a Gaussian?
Best Answer
I have an answer found with help from two technical reports (I can only post one link will post the other one in comments) [1], 2. The report from [1] only showed a univariate Gaussian. Here is my attempt at the multivariate case.
The basic idea is to use Bayes law: $p(Y|X) = frac{p(Y,X)}{p(X)}$
We know from 2 that the joint of the linear Gaussian is:
$p(X,Y) = mathcal{N} left( begin{pmatrix} boldsymbol{mu_X} \ B boldsymbol{mu_X} + mathbf{a} end{pmatrix} , Sigma_{X,Y} right)$
with the process noise described by $Sigma_{w}$ we have
$ Sigma_{X,Y} = begin{pmatrix} B^T Sigma_{w}^{-1} B + Sigma_{X}^{-1} & -B^T Sigma_{w}^{-1} \ -Sigma_{w}^{-1} B & Sigma_{w}^{-1} end{pmatrix}^{-1} = begin{pmatrix} Sigma_{X} & Sigma_{X} B^{T} \ B Sigma_{X} & Sigma_{w} + B Sigma_{X} B^T end{pmatrix}$
Now to get $p(Y|X)$ we devide it py $p(X)$ which in canonical form is
$K_X = Sigma_{X}^{-1}$ and $mathbf{h} = K_X mu_X$
Dividing it out gives us:
$ K_{X|Y} = begin{pmatrix} B^T Sigma_{w}^{-1} B & -B^T Sigma_{w}^{-1} \ -Sigma_{w}^{-1} B & Sigma_{w}^{-1} end{pmatrix}^{-1} $, $mathbf{h}_{X|Y} = begin{pmatrix} 0 \ vdots \ 0 end{pmatrix}, g_{X|Y} = – log((2 pi)^{n/2} |Sigma_{w}|^{1/2}) $
With $n$ the dimension of the Gaussian.
Note that I went with zero mean process noise and also assumed $mathbf{a}$ to be zero.
The result in canonical form is probably not a valid Gaussian as $K_{X|Y}$ is probably not invertible. Multiplying it with the $p(X)$ then however gives you a valid Gaussian as one would expect.
Similar Posts:
- Solved – Metropolis-Hastings with two dimensional target distribution
- Solved – Hessian matrix for maximum likelihood
- Solved – Conditional Probability Distribution of Multivariate Gaussian
- Solved – Conditional Probability Distribution of Multivariate Gaussian
- Solved – Sufficient statistic for bivariate or multivariate normal