08 August 2008

Parallel Sampling

I've been thinking a lot recently about how to do MCMC on massively parallel architectures, for instance in a (massively) multi-core setup (either with or without shared memory).

There are several ways to approach this problem.

The first is the "brain dead" approach. If you have N-many cores, just run N-many parallel (independent) samplers. Done. The problem here is that if N is really like (say 1000 or greater), then this is probably a complete waste of space/time.

The next approach works if you're doing something like (uncollapsed) Gibbs sampling. Here, the Markov blankets usually separate in a fairly reasonable way. So you can literally distribute the work precisely as specified by the Markov blankets. With a bit of engineering, you can probably do this is a pretty effective manner. The problem, of course, is if you have strongly overlapping Markov blankets. (I.e., if you don't have good separation in the network.) This can happen either due to model structure, or due to collapsing certain variables. In this case, this approach just doesn't work at all.

The third approach---and the only one that really seems plausible---would be to construct sampling schemes exclusively for a massively parallel architecture. For instance, if you can divide your variables in some reasonable way, you can probably run semi-independent samplers that communicate on an as-needed basis. The form of this communication might, for instance, look something like an MH-step, or perhaps something more complex.

At any rate, I've done a bit of a literature survey to find examples of systems that work like this, but have turned up surprisingly little. I can't imagine that there's that little work on this problem, though.

15 comments:

I am who I am said...
This post has been removed by the author.
David Hall said...

Regarding #2, it depends on what you mean "doesn't work at all". Taking a look at books.nips.cc/papers/files/nips20/NIPS2007_0672.pdf , doing an approximate variant of Gibbs seems to work quite well for LDA. In my own experiments, I take further liberties, and the results are usually fairly encouraging.

Granted, as the number of processors grows large (and the size of the data fails to grow with it), I'm sure the effects of "cheating" at Gibbs sampling will be more noticeable.

Regardless, I'd be very interested to see a variant of MCMC like you proposed in #3.

David Hall said...

(also, sorry about the first post. I'm not sure how my name got mangled into that...)

Mark Johnson said...

I haven't seriously looked at this, so take these comments with a grain of salt, but I suspect that some kind of particle filter may turn out to parallelize better than MCMC approaches that try to maintain a single chain. I suspect you don't need global communication to ensure that the particle population is distributed according to the desired population, but that e.g., pairwise communication between random pairs is sufficient. Cappe and Robert's work on adaptive importance sampling and population monte carlo seems very promising here.

hal said...

David -- I didn't mean to actually imply that it doesn't work :)... just that it seems unlikely to scale dramatically without incurring significant penalties.

Mark -- I've thought about particle filtering, but the worry here is that in order for pfilters to be competitive, you usually have to do an MCMC resampling pass, which then won't distribute :).

Mark Johnson said...

Hi Hal,

While the standard presentations of particle filtering assume a global resampling step, I think that particle filters are still correct (but maybe less efficient) if this resampling is only done over subsets of the particles.

Imagine you were running n sets of particle filters in parallel; each of the sets of particles is a sample from the desired distribution, but of course a global resampling step isn't necessary.

Moreover, since each set of particles is a sample from the same distribution, permuting the particles across the filters doesn't change the distributions, but may promote faster mixing.

This leads to the idea in my first message; after each update we randomly assign particles to groups, and resample only locally within each group.

hal said...

Hi Mark -- In fact, you don't need to do any resampling to be correct -- you just get much better estimates if you do. I think you can probably go one step further... something like:

1. Split data across processors.

2. Run independent particle filters on each processor for some number of data points (say, 100). This can involve MCMC moves, etc., but each is essentially running independently.

3. Randomly shuffle particles between processors. Suppose we have 1000 processors and each gets 100 particles. Then accumulate the 100k particles and randomly re-distribute them to processors. (Can be done much more efficiently so that you don't actually need a "host" to collect them.)

4. Each processor runs a new set of MCMC moves on its particles.

5. Go to 2.

I'd have to write it down to verify, but at least intuitively it seems like this would still leave us with samples from the true posterior. The shuffling is done essentially to ensure some communication between the different subsets of data (which somehow seems important.)

Of course, we might not actually want to do particle filtering but rather some batch method. Is there an easy pfilter -> batch conversion? (I suspect there should be, but haven't worked through it.)

Mark Johnson said...

Hi Hal,

Yes, what you sketch is the kind of thing I have in mind. Particle filtering is usually viewed as an on-line procedure, but it doesn't have to be; the "adaptive importance sampling" work I mentioned earlier develops this idea.

Mark

Anonymous said...

Hello,

I found your site/blog interesting and would like to buy an ad on your page:
http://nlpers.blogspot.com/

The ad would be the following website [www.interpretersunlimited.com]

I don't have very big budget, but I'm sure we could arrange a reasonable price.

If you're interested, please get back to me(jules@bestrank.com).

Thanks!

Radford said...

Hi,

You might be interested in my paper on "circular coupling". Among other things, this technique allows one to simulate a single Markov chain in parallel, by stiching together parts done on different processors to create a single coherent Markov chain. (Actually a circular one, as you can guess from the title.)

You can get it from here.

Radford Neal

shiv said...

Hi Sir,

I am doing MTech from Indian Institute of Technology Bombay. I am doing a project on speech summarization. Is there any open source tool available for performing speech summarization to which i can make modifications since i have not more than 6 months for this project and i guess starting from scratch would be tough in such a short span.

Can you give me some idea as to how should i start it.

Thanks & Regards
T. Shivkiran
shivkiran@cse.iitb.ac.in

Anonymous said...

I always heard something from my neighbor that he sometimes goes to the internet bar to play the game which will use him some habbo credits,he usually can win a lot of habbo gold,then he let his friends all have some habbo coins,his friends thank him very much for introducing them the cheap habbo credits,they usually buy habbo gold together.

cutepig said...

I think many players need the ro zeny, so more and more people beginning to buy ro zeny, but we can not buy the cheap ro zeny, this problem for us is very hard, so I hope who can tell me buy the ragnarok zeny.

Anonymous said...

網頁設計,情趣用品,情趣用品,情趣用品,情趣用品
色情遊戲,寄情築園小遊戲,情色文學,一葉情貼圖片區,情惑用品性易購,情人視訊網,辣妹視訊,情色交友,成人論壇,情色論壇,愛情公寓,情色,舊情人,情色貼圖,色情聊天室,色情小說,做愛,做愛影片,性愛

免費視訊聊天室,aio交友愛情館,愛情公寓,一葉情貼圖片區,情色貼圖,情色文學,色情聊天室,情色小說,情色電影,情色論壇,成人論壇,辣妹視訊,視訊聊天室,情色視訊,免費視訊,免費視訊聊天,視訊交友網,視訊聊天室,視訊美女,視訊交友,視訊交友90739,UT聊天室,聊天室,豆豆聊天室,尋夢園聊天室,聊天室尋夢園,080聊天室,080苗栗人聊天室,女同志聊天室,上班族聊天室,小高聊天室 

AV,AV女優
視訊,影音視訊聊天室,視訊交友
視訊,影音視訊聊天室,視訊聊天室,視訊交友,視訊聊天,視訊美女

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花