(The contents of this post are largely due to a conversation with Percy Liang at ACL.)

I'm a big fan of Gibbs sampling for Bayesian problems, just because it's so darn easy. The standard setup for Gibbs sampling over a space of variables a,b,c (I'll assume there are no exploitable independences) is:

- Draw a conditioned on b,c
- Draw b conditioned on a,c
- Draw c conditioned on a,b

- Draw a,b conditioned on c
- Draw c conditioned on a,b

- Draw a conditioned on c
- Draw b conditioned on a,c
- Draw c conditioned on a,b

- Draw a conditioned on c
- Draw c conditioned on a

- For each word token, draw a tag for that word conditioned on the word itself, the tag to the left, and the "probability of word given tag" parameters.
- For each tag type (not token), draw a multinomial parameter vector for "probability of word given tag" conditioned on the current assignment of tags to words.

- For each word token, draw a tag for that word conditioned on the word itself, the tag to the left, and all other current assignments of tags to words.

The point of this post is that acknowledge that this may not always be the case. In fact, it's sort of obvious in retrospect. There are many models for which auxiliary variables are added just to make the sampling easier. This is, in effect, un-collapsing the sampler. If "always collapse" is a good rule to follow, then people would never add auxiliary variables.

While this is a convincing argument (for me, at least), it's not particularly intuitive. I think that the intuition comes from considering the mixing rate of the Markov chain specified by the standard Gibbs sampler and the collapsed Gibbs sampler. It seems that essentially what's happening by using a collapsed sampler is that the variance of the Markov chain is decreasing. In the tagging example, consider a frequent word. In the collapsed setting, the chance that the tag for a single token of this word will change in a Gibbs step is roughly inversely proportional to its term frequency. This means that the collapsed sampler is going to have a tendency to get stuck (and this is exactly what Mark's results seem to suggest). On the other hand, in the uncollapsed case, it is reasonably plausible that a large number of tags could change for a single word type "simultaneously" due to a slightly different draw of the "p(word|tag)" parameter vector.

(Interestingly, in the case of LDA, the collapsed sampler is the standard approach and my sense is that it is actually somehow not causing serious problems here. But I actually haven't seen experiments that bear on this.)

## 10 comments:

Yes, I think the issue is tricky. I had assumed that collapsing was always better since there are fewer variables to sample, but as you point out, if that were the case then introducing auxiliary variables should never help. (Of course there are other reasons for introducing auxiliary variables; perhaps the distribution you're interested in is more easily expressed as the marginal of some more complex distribution).

Anyway, that EMNLP paper showed that alternating maximization (of the kind used by EM and Variational Bayes) seems to do better than collapsed Gibbs. Percy noted that an uncollapsed Gibbs has an alternating structure similar to EM and VB, so perhaps it will do as well as those other algorithms? Anyway, it's on my list of things to try real soon, but the uncollapsed Gibbs is actually harder to implement than collapsed Gibbs (see my NAACL 07 paper for how to do this for PCFGs).

Sorry to monopolize this comment section, but I just implemented the uncollapsed Gibbs sampler for HMMs and it seems to do much better than collapsed Gibbs; in fact, it seems to be about the same as Variational Bayes.

These results are only preliminary (I'm having trouble getting time on the cluster; our summer interns sure know how to burn cycles!), and as I noted in my EMNLP paper, you really need multiple runs with a large number of iterations to be sure of anything, but so far it looks good.

The gibbs sampler for LDA (according to Griffiths and Steyvers) includes auxiliary variables for the topics (z) and then integrates the multinomials theta and phi out.

Maybe I am getting something wrong, but it seems like LDA is collapsing and un-collapsing at the same time.

My intuition is that one should integrate out complex distributions (dirichlets, multinomials,...) and include discrete indicator variables.

Laura: I don't know of a formal definition of a "collapsed Gibbs sampler"; I think of a Gibbs sampler as "collapsed" whenever some of the variables in the model are integrated out rather than sampled. So I guess in principle there may be several different ways of constructing collapsed Gibbs samplers from a given model, depending on which variables you intended to integrate out and which ones you intend to sample.

By the way, Sharon Goldwater visited Microsoft Research to give another of her cool talks on Monday, and I decided it was time to figure out what was going on between her ACL 07 paper and my EMNLP 07 paper.

Sharon found that (collapsed) Gibbs did much better than EM on unsupervised POS tagging, while I found the reverse. Anyway, one big difference is in the size of the problems: Sharon was working with a 24K word subset of the PTB, while I was working with all 1M words, and Sharon was working with a reduced set of 17 tags, while I was working with all 45 tags.

So I made up a corpus that looked as much like Sharon's as I could, and guess what: collapsed Gibbs works like a charm! (I was relieved, as it means I don't have a bug in my sampler). Variational Bayes, which with the full corpus worked best of all, didn't do as well, but uncollapsed Gibbs seems to work best of all. (These are still preliminary results, so take them with a grain of salt).

My post-hoc rationalization of this is that with the smaller data set the posterior is markedly less peaked, and the Gibbs samplers are really sampling from the posterior, while VB is estimating an approximation to that posterior. In my EMNLP paper I was using a much larger data set, and the collapsed Gibbs sampler has mobility problems with large data sets, hence its poor results. Also, with large data sets the posterior is much more peaked, so the Variational Bayes approximation is much more accurate.

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差、打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店當日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作、兼差打工、假日兼職、兼職工作、酒店兼差、兼差、打工兼差、日領工作、晚上兼差工作、酒店工作、酒店上班、酒店打工、兼職、兼差、兼差工作、酒店上班等,想了解酒店相關工作和特種行業內容,想兼職工作日領、假日兼職、兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚，無業績壓力，無脫秀無喝酒壓力，高層次會員制客源，工作輕鬆，可日領、現領。

一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已，對水水們的上班安全一點保障都沒有！艾葳酒店經紀公司的水水們上班時全程媽咪作陪，不需擔心！只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職、缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??

艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,

等相關服務 幫您快速的實現您的夢想~!!

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

kameralı sohbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

seslidunya

seslisehir

sesli sex

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

sesli muhabbet siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

görüntülü muhabbet siteleri

kameralı sohbet siteleri

kameralı chat siteleri

kameralı muhabbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

birsesver

birses

seslidunya

seslisehir

sesli sex

Post a Comment