Solved – What does it mean to say that “a topic is a distribution on words”

I was taking a machine learning course and they say the following two phrases that confuse me:

each document is a distribution on topics.


each topic is a distribution on words.

I was wondering if someone knew what that meant.

Here is a link to the notes:

Currently this is how I interpret it (my thoughts).

Well, we are modeling a topic as a vector $u^{(i)}$ with the relative frequencies of each word. So it just specifies how often each word appears in a specific topic. Also, each document can approximately thought of as a linear combination of these topic vectors, i.e. document $M_{j} = sum^{r}_{i=1} w_{i}u^{(i)}$

thought I wasn't sure if that was right or how to include the concept of "distribution" to this.

Typically, in the context of Latent Dirichlet Allocation (used for Topic Modeling), we assume that the documents come from a generative process. I'll avoid math notation. Look at this figure:


  • (1) Every topic is generated from a Dirichlet distribution of $V$ dimensions where $V$ is the size of your vocabulary.

  • (2) For every document:

    • (2.1) Generate a distribution over topics from a Dirichlet distribution of $T$ dimensions where $T$ is the number of topics in the corpus.
    • (2.2) For every word in the document:
      • (2.2.1) Choose a topic according to the distribution generated at (2.1)
      • (2.2.2) Choose a word according to the distribution corresponding to the chosen topic (generated at (1))

The rigorous mathematical explanaition is here (section 3).

So, each topic is a probability distribution over the words of the vocabulary (1) because it says the probability, in that topic, of the word "dog" to appear.

And each document has a probability distribution over topics (2.1) which says from which topics the document is more likely to draw its words. We say that a document is a mixture of topics


  • A Dirichlet distribution of three dimensions draws thinks like [0.2,0.4,0.4], [0.3,0.3,0.4], etc. which can be used as Categorical distributions. This is why it is used to generate distributions over $V$ words (topics), and distributions over $T$ topics. See left and right sides of the figure.

Similar Posts:

Rate this post

Leave a Comment