08 August 2008

Parallel Sampling

I've been thinking a lot recently about how to do MCMC on massively parallel architectures, for instance in a (massively) multi-core setup (either with or without shared memory).

There are several ways to approach this problem.

The first is the "brain dead" approach. If you have N-many cores, just run N-many parallel (independent) samplers. Done. The problem here is that if N is really like (say 1000 or greater), then this is probably a complete waste of space/time.

The next approach works if you're doing something like (uncollapsed) Gibbs sampling. Here, the Markov blankets usually separate in a fairly reasonable way. So you can literally distribute the work precisely as specified by the Markov blankets. With a bit of engineering, you can probably do this is a pretty effective manner. The problem, of course, is if you have strongly overlapping Markov blankets. (I.e., if you don't have good separation in the network.) This can happen either due to model structure, or due to collapsing certain variables. In this case, this approach just doesn't work at all.

The third approach---and the only one that really seems plausible---would be to construct sampling schemes exclusively for a massively parallel architecture. For instance, if you can divide your variables in some reasonable way, you can probably run semi-independent samplers that communicate on an as-needed basis. The form of this communication might, for instance, look something like an MH-step, or perhaps something more complex.

At any rate, I've done a bit of a literature survey to find examples of systems that work like this, but have turned up surprisingly little. I can't imagine that there's that little work on this problem, though.


David Hall said...
This comment has been removed by the author.
David Hall said...

Regarding #2, it depends on what you mean "doesn't work at all". Taking a look at books.nips.cc/papers/files/nips20/NIPS2007_0672.pdf , doing an approximate variant of Gibbs seems to work quite well for LDA. In my own experiments, I take further liberties, and the results are usually fairly encouraging.

Granted, as the number of processors grows large (and the size of the data fails to grow with it), I'm sure the effects of "cheating" at Gibbs sampling will be more noticeable.

Regardless, I'd be very interested to see a variant of MCMC like you proposed in #3.

David Hall said...

(also, sorry about the first post. I'm not sure how my name got mangled into that...)

Mark Johnson said...

I haven't seriously looked at this, so take these comments with a grain of salt, but I suspect that some kind of particle filter may turn out to parallelize better than MCMC approaches that try to maintain a single chain. I suspect you don't need global communication to ensure that the particle population is distributed according to the desired population, but that e.g., pairwise communication between random pairs is sufficient. Cappe and Robert's work on adaptive importance sampling and population monte carlo seems very promising here.

hal said...

David -- I didn't mean to actually imply that it doesn't work :)... just that it seems unlikely to scale dramatically without incurring significant penalties.

Mark -- I've thought about particle filtering, but the worry here is that in order for pfilters to be competitive, you usually have to do an MCMC resampling pass, which then won't distribute :).

Mark Johnson said...

Hi Hal,

While the standard presentations of particle filtering assume a global resampling step, I think that particle filters are still correct (but maybe less efficient) if this resampling is only done over subsets of the particles.

Imagine you were running n sets of particle filters in parallel; each of the sets of particles is a sample from the desired distribution, but of course a global resampling step isn't necessary.

Moreover, since each set of particles is a sample from the same distribution, permuting the particles across the filters doesn't change the distributions, but may promote faster mixing.

This leads to the idea in my first message; after each update we randomly assign particles to groups, and resample only locally within each group.

hal said...

Hi Mark -- In fact, you don't need to do any resampling to be correct -- you just get much better estimates if you do. I think you can probably go one step further... something like:

1. Split data across processors.

2. Run independent particle filters on each processor for some number of data points (say, 100). This can involve MCMC moves, etc., but each is essentially running independently.

3. Randomly shuffle particles between processors. Suppose we have 1000 processors and each gets 100 particles. Then accumulate the 100k particles and randomly re-distribute them to processors. (Can be done much more efficiently so that you don't actually need a "host" to collect them.)

4. Each processor runs a new set of MCMC moves on its particles.

5. Go to 2.

I'd have to write it down to verify, but at least intuitively it seems like this would still leave us with samples from the true posterior. The shuffling is done essentially to ensure some communication between the different subsets of data (which somehow seems important.)

Of course, we might not actually want to do particle filtering but rather some batch method. Is there an easy pfilter -> batch conversion? (I suspect there should be, but haven't worked through it.)

Mark Johnson said...

Hi Hal,

Yes, what you sketch is the kind of thing I have in mind. Particle filtering is usually viewed as an on-line procedure, but it doesn't have to be; the "adaptive importance sampling" work I mentioned earlier develops this idea.


Anonymous said...


I found your site/blog interesting and would like to buy an ad on your page:

The ad would be the following website [www.interpretersunlimited.com]

I don't have very big budget, but I'm sure we could arrange a reasonable price.

If you're interested, please get back to me(jules@bestrank.com).


Unknown said...


You might be interested in my paper on "circular coupling". Among other things, this technique allows one to simulate a single Markov chain in parallel, by stiching together parts done on different processors to create a single coherent Markov chain. (Actually a circular one, as you can guess from the title.)

You can get it from here.

Radford Neal

Unknown said...

Hi Sir,

I am doing MTech from Indian Institute of Technology Bombay. I am doing a project on speech summarization. Is there any open source tool available for performing speech summarization to which i can make modifications since i have not more than 6 months for this project and i guess starting from scratch would be tough in such a short span.

Can you give me some idea as to how should i start it.

Thanks & Regards
T. Shivkiran

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Anonymous said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

等相關服務 幫您快速的實現您的夢想~!!

combattery84 said...

IBM ThinkPad R60 Battery
IBM ThinkPad T60 Battery
IBM ThinkPad T41 Battery
IBM ThinkPad T43 Battery
IBM ThinkPad X40 Battery
Thinkpad x24 battery
ThinkPad G41 battery
IBM thinkpad r52 battery
Thinkpad x22 battery
IBM thinkpad t42 battery
IBM thinkpad r51 battery
Thinkpad r50 battery
IBM thinkpad r32 battery
Thinkpad x41 battery
SONY VGP-BPS5 battery
SONY VGP-BPL2C battery
SONY VGP-BPS2A battery
SONY VGP-BPS2B battery
SONY PCGA-BP1N battery
SONY PCGA-BP2E battery
SONY PCGA-BP2S battery
SONY PCGA-BP2T battery
SONY PCGA-BP2V battery
SONY PCGA-BP4V battery
SONY PCGA-BP71 battery
SONY PCGA-BP71A battery
SONY VGP-BPL1 battery
SONY VGP-BPL2 battery