natural language processing blog: Parallel Sampling

08 August 2008

Parallel Sampling

I've been thinking a lot recently about how to do MCMC on massively parallel architectures, for instance in a (massively) multi-core setup (either with or without shared memory).

There are several ways to approach this problem.

The first is the "brain dead" approach. If you have N-many cores, just run N-many parallel (independent) samplers. Done. The problem here is that if N is really like (say 1000 or greater), then this is probably a complete waste of space/time.

The next approach works if you're doing something like (uncollapsed) Gibbs sampling. Here, the Markov blankets usually separate in a fairly reasonable way. So you can literally distribute the work precisely as specified by the Markov blankets. With a bit of engineering, you can probably do this is a pretty effective manner. The problem, of course, is if you have strongly overlapping Markov blankets. (I.e., if you don't have good separation in the network.) This can happen either due to model structure, or due to collapsing certain variables. In this case, this approach just doesn't work at all.

The third approach---and the only one that really seems plausible---would be to construct sampling schemes exclusively for a massively parallel architecture. For instance, if you can divide your variables in some reasonable way, you can probably run semi-independent samplers that communicate on an as-needed basis. The form of this communication might, for instance, look something like an MH-step, or perhaps something more complex.

At any rate, I've done a bit of a literature survey to find examples of systems that work like this, but have turned up surprisingly little. I can't imagine that there's that little work on this problem, though.

14 comments:

David Hall said...: This comment has been removed by the author.; 08 August, 2008 15:49
David Hall said...: Regarding #2, it depends on what you mean "doesn't work at all". Taking a look at books.nips.cc/papers/files/nips20/NIPS2007_0672.pdf , doing an approximate variant of Gibbs seems to work quite well for LDA. In my own experiments, I take further liberties, and the results are usually fairly encouraging.

Granted, as the number of processors grows large (and the size of the data fails to grow with it), I'm sure the effects of "cheating" at Gibbs sampling will be more noticeable.

Regardless, I'd be very interested to see a variant of MCMC like you proposed in #3.; 08 August, 2008 15:52
David Hall said...: (also, sorry about the first post. I'm not sure how my name got mangled into that...); 08 August, 2008 16:01
Mark Johnson said...: I haven't seriously looked at this, so take these comments with a grain of salt, but I suspect that some kind of particle filter may turn out to parallelize better than MCMC approaches that try to maintain a single chain. I suspect you don't need global communication to ensure that the particle population is distributed according to the desired population, but that e.g., pairwise communication between random pairs is sufficient. Cappe and Robert's work on adaptive importance sampling and population monte carlo seems very promising here.; 08 August, 2008 21:07
hal said...: David -- I didn't mean to actually imply that it doesn't work :)... just that it seems unlikely to scale dramatically without incurring significant penalties.

Mark -- I've thought about particle filtering, but the worry here is that in order for pfilters to be competitive, you usually have to do an MCMC resampling pass, which then won't distribute :).; 13 August, 2008 21:02
Mark Johnson said...: Hi Hal,

While the standard presentations of particle filtering assume a global resampling step, I think that particle filters are still correct (but maybe less efficient) if this resampling is only done over subsets of the particles.

Imagine you were running n sets of particle filters in parallel; each of the sets of particles is a sample from the desired distribution, but of course a global resampling step isn't necessary.

Moreover, since each set of particles is a sample from the same distribution, permuting the particles across the filters doesn't change the distributions, but may promote faster mixing.

This leads to the idea in my first message; after each update we randomly assign particles to groups, and resample only locally within each group.; 14 August, 2008 19:32
hal said...: Hi Mark -- In fact, you don't need to do any resampling to be correct -- you just get much better estimates if you do. I think you can probably go one step further... something like:

1. Split data across processors.

2. Run independent particle filters on each processor for some number of data points (say, 100). This can involve MCMC moves, etc., but each is essentially running independently.

3. Randomly shuffle particles between processors. Suppose we have 1000 processors and each gets 100 particles. Then accumulate the 100k particles and randomly re-distribute them to processors. (Can be done much more efficiently so that you don't actually need a "host" to collect them.)

4. Each processor runs a new set of MCMC moves on its particles.

5. Go to 2.

I'd have to write it down to verify, but at least intuitively it seems like this would still leave us with samples from the true posterior. The shuffling is done essentially to ensure some communication between the different subsets of data (which somehow seems important.)

Of course, we might not actually want to do particle filtering but rather some batch method. Is there an easy pfilter -> batch conversion? (I suspect there should be, but haven't worked through it.); 15 August, 2008 11:28
Mark Johnson said...: Hi Hal,

Yes, what you sketch is the kind of thing I have in mind. Particle filtering is usually viewed as an on-line procedure, but it doesn't have to be; the "adaptive importance sampling" work I mentioned earlier develops this idea.

Mark; 24 August, 2008 13:37
Anonymous said...: Hello,

I found your site/blog interesting and would like to buy an ad on your page:
http://nlpers.blogspot.com/

The ad would be the following website [www.interpretersunlimited.com]

I don't have very big budget, but I'm sure we could arrange a reasonable price.

If you're interested, please get back to me(jules@bestrank.com).

Thanks!; 26 August, 2008 14:25
Unknown said...: Hi,

You might be interested in my paper on "circular coupling". Among other things, this technique allows one to simulate a single Markov chain in parallel, by stiching together parts done on different processors to create a single coherent Markov chain. (Actually a circular one, as you can guess from the title.)

You can get it from here.

Radford Neal; 27 August, 2008 09:16
Unknown said...: Hi Sir,

I am doing MTech from Indian Institute of Technology Bombay. I am doing a project on speech summarization. Is there any open source tool available for performing speech summarization to which i can make modifications since i have not more than 6 months for this project and i guess starting from scratch would be tough in such a short span.

Can you give me some idea as to how should i start it.

Thanks & Regards
T. Shivkiran
shivkiran@cse.iitb.ac.in; 14 September, 2008 00:18
Anonymous said...: 酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店酒店兼差PRETTY GIRL酒店公關酒店小姐彩色爆米花酒店兼職,酒店工作彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀彩色爆米花; 12 May, 2009 10:31
Anonymous said...: 艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差、打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店當日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作、兼差打工、假日兼職、兼職工作、酒店兼差、兼差、打工兼差、日領工作、晚上兼差工作、酒店工作、酒店上班、酒店打工、兼職、兼差、兼差工作、酒店上班等,想了解酒店相關工作和特種行業內容,想兼職工作日領、假日兼職、兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚，無業績壓力，無脫秀無喝酒壓力，高層次會員制客源，工作輕鬆，可日領、現領。
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已，對水水們的上班安全一點保障都沒有！艾葳酒店經紀公司的水水們上班時全程媽咪作陪，不需擔心！只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職、缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務幫您快速的實現您的夢想~!!; 20 September, 2009 20:18
combattery84 said...: IBM ThinkPad R60 Battery
IBM ThinkPad T60 Battery
IBM ThinkPad T41 Battery
IBM ThinkPad T43 Battery
IBM ThinkPad X40 Battery
Thinkpad x24 battery
ThinkPad G41 battery
IBM thinkpad r52 battery
Thinkpad x22 battery
IBM thinkpad t42 battery
IBM thinkpad r51 battery
Thinkpad r50 battery
IBM thinkpad r32 battery
Thinkpad x41 battery
SONY VGP-BPS2 Battery
SONY VGP-BPS2C Battery
SONY VGP-BPS5 battery
SONY VGP-BPL2C battery
SONY VGP-BPS2A battery
SONY VGP-BPS2B battery
SONY PCGA-BP1N battery
SONY PCGA-BP2E battery
SONY PCGA-BP2NX battery
SONY PCGA-BP2S battery
SONY PCGA-BP2SA battery
SONY PCGA-BP2T battery
SONY PCGA-BP2V battery
SONY PCGA-BP4V battery
SONY PCGA-BP71 battery
SONY PCGA-BP71A battery
SONY VGP-BPL1 battery
SONY VGP-BPL2 battery; 07 July, 2010 00:28

natural language processing blog

08 August 2008

Parallel Sampling

14 comments:

About Me

Labels

My Blog List

Blog Archive