natural language processing blog: topic models

19 February 2009

Mixing of gibbs samplers for topic (and mixture) models

This is just a quick note because it's something that had somehow never occurred to me until a few days ago. If you've ever talked to me in person about the standard (collapsed) Gibbs samplers for topic models, you know that I get concerned that these things don't mix. And it's not just the generic "how do we know" feeling that can get levied at (almost) any sampler, but a very specific we know for certain that our Gibbs samplers for topic models aren't mixing. How do we know? Because, for the most part, they don't mode switch. You can figure this out quite easily by simply watching the topic assignments. (The standard collapsed samplers for Gaussian or Multinomial mixture models exhibit the same problem.) At least if you have a reasonable amount of data.

The reason this always bugged me is because we now have definitive evidence that these things aren't mixing in the sense of mode hopping, which leads me to worry that they might also not be mixing in other, less easy to detect ways.

Well, worry no more. Or at least worry less. The mode hopping is a red herring.

Maybe the following observation is obvious, but I'll say it anyway.

Let's take our topic model Gibbs sampler and introduce a new Metropolis-Hastings step. This MH step simply takes two topic indices (say topics i and j) and swaps them. It picks i and j uniformly at random from the (K choose 2) possibilities. It's easy to verify that the acceptance probability for this move will be one (the qs will cancel and the ps are identical), which means that it's really more like a Gibbs step than an MH step (in the sense that Gibbs steps are MH steps that are always accepted).

The observation is that (a) this doesn't actually change what the sampler is doing in any real, meaningful way -- that is, re-indexing things is irrelevant; but (b) you now cannot claim that the sampler isn't mode switching. It's mode switching a lot.

Sure, it might still have poor mixing properties for other reasons, but at least now there isn't this elephant-in-the-room reason it can't possibly be mixing.

So this is a totally "useless" practical observation (sometimes we like to try to exploit the fact that it's not mixing), but from a theoretical perspective it might be interesting. For instance, if you want to prove something about a fast mixing rate for these samplers (if you believe they are actually mixing fast, which I'm somewhat keen to believe), then you're not going to be able to prove anything if you don't make this trivial change to the sampler (because without it they're not really mixing fast). So it might have interesting theoretical consequences.

natural language processing blog

19 February 2009

Mixing of gibbs samplers for topic (and mixture) models

About Me

Labels

My Blog List

Blog Archive