This question was asked in physics stack exchange but didn't get an answer and it was suggested this would be a better place. Two years later I am wondering the same thing. Here is the question with slightly different wording:

How can a linear Gaussian conditional probability distribution be represented in canonical form?

For example, let $mathbf{X}$ and $mathbf{Y}$ be two sets of continuous variables, with $|mathbf{X}| = n$ and $|mathbf{Y}| = m$. Let

$p(mathbf{Y} | mathbf{X}) = mathcal{N}(mathbf{Y} | mathbf{a} + Bmathbf{X}; C)$

where $mathbf{a}$ is a vector of dimension $m$, $B$ is an $m$ by $n$ matrix, and $C$ is an $m$ by $m$ matrix.

How does one represent that in canonical form?

This is boggling me particularly since a linear Gaussian is not necessarily a Gaussian probability distribution.

The canonical representation of a Gaussian has

$K = Sigma^{-1}$ and $mathbf{h} = Sigma^{-1} boldsymbol{mu}$.

How can one have a $K$ and $mathbf{h}$ for a something that is not a Gaussian?

**Contents**hide

#### Best Answer

I have an answer found with help from two technical reports (I can only post one link will post the other one in comments) [1], 2. The report from [1] only showed a univariate Gaussian. Here is my attempt at the multivariate case.

The basic idea is to use Bayes law: $p(Y|X) = frac{p(Y,X)}{p(X)}$

We know from 2 that the joint of the linear Gaussian is:

$p(X,Y) = mathcal{N} left( begin{pmatrix} boldsymbol{mu_X} \ B boldsymbol{mu_X} + mathbf{a} end{pmatrix} , Sigma_{X,Y} right)$

with the process noise described by $Sigma_{w}$ we have

$ Sigma_{X,Y} = begin{pmatrix} B^T Sigma_{w}^{-1} B + Sigma_{X}^{-1} & -B^T Sigma_{w}^{-1} \ -Sigma_{w}^{-1} B & Sigma_{w}^{-1} end{pmatrix}^{-1} = begin{pmatrix} Sigma_{X} & Sigma_{X} B^{T} \ B Sigma_{X} & Sigma_{w} + B Sigma_{X} B^T end{pmatrix}$

Now to get $p(Y|X)$ we devide it py $p(X)$ which in canonical form is

$K_X = Sigma_{X}^{-1}$ and $mathbf{h} = K_X mu_X$

Dividing it out gives us:

$ K_{X|Y} = begin{pmatrix} B^T Sigma_{w}^{-1} B & -B^T Sigma_{w}^{-1} \ -Sigma_{w}^{-1} B & Sigma_{w}^{-1} end{pmatrix}^{-1} $, $mathbf{h}_{X|Y} = begin{pmatrix} 0 \ vdots \ 0 end{pmatrix}, g_{X|Y} = – log((2 pi)^{n/2} |Sigma_{w}|^{1/2}) $

With $n$ the dimension of the Gaussian.

Note that I went with zero mean process noise and also assumed $mathbf{a}$ to be zero.

The result in canonical form is probably not a valid Gaussian as $K_{X|Y}$ is probably not invertible. Multiplying it with the $p(X)$ then however gives you a valid Gaussian as one would expect.

### Similar Posts:

- Solved – Metropolis-Hastings with two dimensional target distribution
- Solved – Hessian matrix for maximum likelihood
- Solved – Conditional Probability Distribution of Multivariate Gaussian
- Solved – Conditional Probability Distribution of Multivariate Gaussian
- Solved – Sufficient statistic for bivariate or multivariate normal