When we have a random variable $x$ with a probability density $p(x)$, and a function $y = f(x)$ that is differentiable and can be solved for $x = g(y)$, the change of variable formula leads us to a density for $y$ given by

$$

p(x) , dx = p(x) left| g'(y) right| , dy = p(x) left| frac{1}{f'(x)} right| , dy = p(x) left| frac{dx}{dy} right| , dy

$$

where $frac{dx}{dy}$ is called (to my knowledge even in the univariate case) the Jacobian of the transformation (as in Zill & Wright, p. 792). In general this would be a determinant of a Jacobian matrix $mathbf{J}(mathbf{g}(mathbf{y}))$, obviously. But I never understood why does it enter in absolute value? I have read somewhere that it's because $f(x)$ could have a negative derivative whereas probabilities are confined to be positive, but that sounds more like a post-hoc justification than a mathematical result. Is there a way to derive this fact?

**Contents**hide

#### Best Answer

For a specific example, in addition to @whuber's advice, let $y=f(x)=-2x$, and $x=g(y)=-y/2$; and $x in [0,1]$, i.e. the support. Then, $y$ would be in the range $[-2,0]$. Also, we have $g'(y)=-1/2, f'(x)=-2$.

Normally, you'd take the integral $int p(g(y))left|frac{dx}{dy}right|dy$ from $-2$ to $0$, while using the formula. However, it actually is from $0$ to $-2$, since $x$ and $y$ directions differ, i.e. $$int_{0}^{-2} p(g(y))frac{dx}{dy}dy=int_{-2}^{0} p(g(y))left(-frac{dx}{dy}right)dy=int_{-2}^{0} p(g(y))left|frac{dx}{dy}right|dy$$

The use of absolute value removes the need of considering the *inverse* directions (i.e. negative directions of $x$ and $y$ which is reflected by negative derivatives).

### Similar Posts:

- Solved – When re-parametrizing a likelihood function, is it enough just to plug in the transformed variable instead of a change of variables formula
- Solved – Condition for RNN vanishing gradients and eigenvalues of the matrix of weights
- Solved – In RNN Back Propagation through time, why is the D(h_t)/D(h_(t-1)) diagonal
- Solved – In RNN Back Propagation through time, why is the D(h_t)/D(h_(t-1)) diagonal
- Solved – Difference between multivariate Gaussian distribution and multivariate Gaussian mixture model