Comments on natural language processing blog: Collapsed Gibbs

Laura: I don't know of a formal definition of a "c...

2007-07-18T08:24:00.000-06:00

Laura: I don't know of a formal definition of a "collapsed Gibbs sampler"; I think of a Gibbs sampler as "collapsed" whenever some of the variables in the model are integrated out rather than sampled. So I guess in principle there may be several different ways of constructing collapsed Gibbs samplers from a given model, depending on which variables you intended to integrate out and which ones you intend to sample.

By the way, Sharon Goldwater visited Microsoft Research to give another of her cool talks on Monday, and I decided it was time to figure out what was going on between her ACL 07 paper and my EMNLP 07 paper.

Sharon found that (collapsed) Gibbs did much better than EM on unsupervised POS tagging, while I found the reverse. Anyway, one big difference is in the size of the problems: Sharon was working with a 24K word subset of the PTB, while I was working with all 1M words, and Sharon was working with a reduced set of 17 tags, while I was working with all 45 tags.

So I made up a corpus that looked as much like Sharon's as I could, and guess what: collapsed Gibbs works like a charm! (I was relieved, as it means I don't have a bug in my sampler). Variational Bayes, which with the full corpus worked best of all, didn't do as well, but uncollapsed Gibbs seems to work best of all. (These are still preliminary results, so take them with a grain of salt).

My post-hoc rationalization of this is that with the smaller data set the posterior is markedly less peaked, and the Gibbs samplers are really sampling from the posterior, while VB is estimating an approximation to that posterior. In my EMNLP paper I was using a much larger data set, and the collapsed Gibbs sampler has mobility problems with large data sets, hence its poor results. Also, with large data sets the posterior is much more peaked, so the Variational Bayes approximation is much more accurate.

The gibbs sampler for LDA (according to Griffiths ...

2007-07-13T09:44:00.000-06:00

The gibbs sampler for LDA (according to Griffiths and Steyvers) includes auxiliary variables for the topics (z) and then integrates the multinomials theta and phi out.

Maybe I am getting something wrong, but it seems like LDA is collapsing and un-collapsing at the same time.

My intuition is that one should integrate out complex distributions (dirichlets, multinomials,...) and include discrete indicator variables.

Sorry to monopolize this comment section, but I ju...

2007-07-11T11:11:00.000-06:00

Sorry to monopolize this comment section, but I just implemented the uncollapsed Gibbs sampler for HMMs and it seems to do much better than collapsed Gibbs; in fact, it seems to be about the same as Variational Bayes.

These results are only preliminary (I'm having trouble getting time on the cluster; our summer interns sure know how to burn cycles!), and as I noted in my EMNLP paper, you really need multiple runs with a large number of iterations to be sure of anything, but so far it looks good.

Yes, I think the issue is tricky. I had assumed t...

2007-07-10T08:28:00.000-06:00

Yes, I think the issue is tricky. I had assumed that collapsing was always better since there are fewer variables to sample, but as you point out, if that were the case then introducing auxiliary variables should never help. (Of course there are other reasons for introducing auxiliary variables; perhaps the distribution you're interested in is more easily expressed as the marginal of some more complex distribution).

Anyway, that EMNLP paper showed that alternating maximization (of the kind used by EM and Variational Bayes) seems to do better than collapsed Gibbs. Percy noted that an uncollapsed Gibbs has an alternating structure similar to EM and VB, so perhaps it will do as well as those other algorithms? Anyway, it's on my list of things to try real soon, but the uncollapsed Gibbs is actually harder to implement than collapsed Gibbs (see my NAACL 07 paper for how to do this for PCFGs).