I read in this tutorial on page 20 that $KL$ divergence is invariant to affine transformation, but I think it is incorrect.
Say we have two 1D normal distributions $P_{1}(x) = mathcal N(mu_{1}, sigma_{1})$ and $P_{2}(x) = mathcal N(mu_{2}, sigma_{2})$. So that $$KL(P_1(x)|P_{2}(x))= E_{1}(ln frac{P_{1}(x)}{P_{2}(x)}) = ln(frac{sigma_{2}}{sigma_{1}}) + frac{1}{2sigma_2^2}(sigma_1^2+(mu_1-mu_2)^2)-frac{1}{2}$$
If we define an affine transformation as $$x^{'} = mu_1 + frac{1}{sigma}(x – mu_1)$$
We will have
$$P_1(x^{'}) = sigma P_1(x = mu_1+ sigma(x' – mu_1)) = mathcal N(mu_1, frac{sigma_1^2}{sigma^2})$$ and
$$P_2(x^{'}) = sigma P_2(x = mu_1+ sigma(x' – mu_1)) = mathcal N(mu_1-frac{1}{sigma}(mu_1-mu_2), frac{sigma_2^2}{sigma^2})$$
Then, the $KL$ divergence for the two transformed distributions is $$KL(P_1(x')|P_2(x')) = E'_1(ln frac{P_1(x')}{P_2(x')}) = ln (frac{sigma_{2}}{sigma_{1}}) + frac{1}{2sigma_2^2}(sigma^2 sigma_1^2+(mu_1-mu_2)^2)-frac{sigma^2}{2}$$
So clearly, for such a simple case $KL$ divergence is not invariant.
However, $KL$ divergence is invariant under affine transformation is crucial for the proof in the tutorial that I referred to.
So, have I misunderstood something?
EDIT:
I think part of my misunderstanding lies in the way that I calculate $P_1(x')$ and $P_2(x')$. So I will expand this part so others can see where I got it wrong.
$$P_1(x') = sigma P_1(x) = sigma P_1(mu_1+sigma (x'-mu_1))$$
given that $$P_1(x)=mathcal N(mu_1, sigma_1)$$
so,
$$sigma P_1(mu_1+sigma (x'-mu_1)) = sigma frac{1}{sqrt{2pi}sigma_1} e^{-frac{1}{2sigma_1^2}(sigma (x' – mu_1))^2} = frac{1}{sqrt{2pi} frac{sigma_1}{sigma}} e^{-frac{1}{2frac{sigma_1^2}{sigma^2}}((x' – mu_1))^2} = mathcal N(mu_1, frac{sigma_1^2}{sigma^2})$$
Then in the exact the same way, I have $$P_2(x^{'}) = sigma P_2(x = mu_1+ sigma(x' – mu_1)) = mathcal N(mu_1-frac{1}{sigma}(mu_1-mu_2), frac{sigma_2^2}{sigma^2})$$
Is there any problem with this?
Best Answer
There are a few mistakes in your math. For example, when you expand the expectation, it seems you dropped the integral and also the $P_1(x)$ term.
Write $y(x) = mx + c$. Recall that $P(x) dx = P(y) dy$. This is easy to see since $dy/dx = m$ and it makes sense that $P(x) = mP(y)$.
Then we can go through with this proof from wikipedia which shows KL is invariant:
Similar Posts:
- Solved – KL divergence between two bivariate Gaussian distribution
- Solved – KL divergence of multivariate lognormal distributions
- Solved – If $X$ and $Y$ are independent Normal variables each with mean zero, then $frac{XY}{sqrt{X^2+Y^2}}$ is also a Normal variable
- Solved – How to compare two Gaussian Processes
- Solved – How to compare two Gaussian Processes