Each distribution is represented with an array of arrays with PMF values.

UPD 1: I have $P=(p_1, … , p_n)$ where $P$ is a distribution of distributions and $p_i=(p_i^1, …, p_i^m)$. My task is to compute $D_{KL}(P, Q)$.

UPD 2: Each $p_i$ is PMF and $sum_j p_i^j=1$ for each i.

**Contents**hide

#### Best Answer

The KL-divergence does not depend on the dimensionality of the distribution – since a pmf must always be one-dimensional. (ie, what would it mean if $P(X = k)$ was a vector?)

What I mean is, the integral/summation in KL-divergence is with respect to $mathbf{x}$, not $theta$. For two distributions $p(mathbf{x})$ and $q(mathbf{x})$, you can write:

$$D_{KL}(p|q) = int_mathcal{X} p(mathbf{x})logfrac{p(mathbf{x})}{q(mathbf{x})}dmathbf{x}$$