Solved – Is the machine learning community abusing “conditioned on” and “parametrized by”

Say, $X$ is dependent on $alpha$. Rigorously speaking,

  • if $X$ and $alpha$ are both random variables, we could write $p(Xmidalpha)$;

  • however, if $X$ is a random variable and $alpha$ is a parameter, we have to write $p(X; alpha)$.

I notice several times that the machine learning community seems to ignore the differences and abuse the terms.

For example, in the famous LDA model, where $alpha$ is the Dirichlet parameter instead of a random variable.

enter image description here

Shouldn't it be $p(theta;alpha)$? I see a lot of people, including the LDA paper's original authors, write it as $p(thetamidalpha)$.

I think this is more about Bayesian/non-Bayesian statistics than machine learning vs.. statistics.

In Bayesian statistics parameter are modelled as random variables, too. If you have a joint distribution for $X,alpha$, $p(X mid alpha)$ is a conditional distribution, no matter what the physical interpretation of $X$ and $alpha$. If one considers only fixed $alpha$s or otherwise does not put a probability distribution over $alpha$, the computations with $p(X; alpha)$ are exactly the same as with $p(X mid alpha)$ with $p(alpha)$. Furthermore, one can at any point decide to extend the model with fixed values of $alpha$ to one where there is a prior distribution over $alpha$. To me at least, it seems strange that the notation for the distribution-given-$alpha$ should change at this point, wherefore some Bayesians prefer to use the conditioning notation even if one has not (yet?) bothered to define all parameters as random variables.

Argument about whether one can write $p(X ; alpha)$ as $p(X mid alpha)$ has also arisen in comments of Andrew Gelman's blog post Misunderstanding the $p$-value. For example, Larry Wasserman had the opinion that $mid$ is not allowed when there is no conditioning-from-joint while Andrew Gelman had the opposite opinion.

Similar Posts:

Rate this post

Leave a Comment