## 25 October 2007

### Non-parametric versus model selection/averaging

Non-parametric approaches (probably the most familiar of which is the Dirichlet process, but there are a whole host of them) are nice because they don't require us to pre-specify a bunch of things that in standard parametric inference would essentially be a model selection issue. For instance, in the DP, we needn't specify how many "clusters" gave rise to our data (in the context of a mixture model).

This brings up the immediate question, though: instead of doing inference in a non-parametric model, why don't you just do model selection (eg by comparing marginals) or model averaging. You can just vary whatever it is that is the "non-parametric" part of the model. For instance, in a DP, you run a bunch of inferences with different numbers of clusters and either choose the best (model selection) or average with respect to the marginals (model averaging). In something like an IBP, you can run with a different number of latent factors and select or average.

I've been asked this general question a few times by non-ML people and I rarely feel like I can give a compelling answer. In particular, I'm not aware of any non-toy experimental comparisons between doing model selection/averaging in any of these models. And even toy ones are hard to come by. But even beyond empirical evidence, I often have a hard time even formulating a coherent qualitative argument.

Here are some points I've come up with, but maybe commentors can either debunk them or add...

1. In some cases, there are lots of parts of the model for which we don't know the structure, so to do model selection/averaging would require trying a ridiculously large number of models. For instance, I might have two components in my model that are DP-ish, so now I have to try quadratically many models.
2. I may not know a good upper/lower bound on the number of components (eg., in a DP). So I'm going to have to try a really large range. In fact, although it's well known that the expected number of clusters in a DP grows as o(log N), where N is the number of data points, it is actually unbounded (and there's a conjecture that it's w(log log N), which isn't terribly slow).
3. Comparing marginal likelihoods across models with a different number of parameters is just plain hard. In fact, for most cases, I don't know how to do it, especially if you want to live in MCMC world. (In variational world you could compare the lower bound on the marginals, but it's always a bit nerve wracking to compare two lower bounds -- you'd rather compare lowers and uppers.) I'm aware of things like reversible jump MCMC and so on, but in most cases these aren't actually applicable to the models you want. Alternatively, if all you want to do is select (not average), you could always do something with held-out data.
The problem is that I can think of counter-arguments to most of these points. In the case of (1), you could argue that if the space is too big, then your sampler isn't going to hit everywhere anyway. In the case of (2), my guess is that for most of these models the marginal will be semi-convex, so you can just start small and keep increasing until things seem to get worse. For (3), this seems to be an argument for developing better MCMC techniques for comparing marginals, not necessarily an argument in favor of non-parametric methods.

But I can go back yet again. To counter the counter to (1), you can argue that the sampler is at least guaranteed after a long time to hit what you care about, whereas if you construct some arbitrary search policy, you may not be. For (2), well...I don't know...I'm pretty convinced by the counter-argument to (2) :P... For (3), you could just disagree and say: why should we develop better MCMC techniques for comparing marginals when we can get away from this whole business by doing non-parametric inference.

Overall, I think non-parametric inference is interesting, useful and fun. But I'd like better arguments against the nay-sayers (who, in my experience, are actually typically non-ML people).

(Note that I'm ignoring the case where the non-parametric model is actually known--or roughly known--to be the right model for your problem. Of course if it's the right model, then you should use it. I'm more referring to the question of using non-parametric methods to get around model selection issues.)

Mark said...

This is an interesting issue! In fact, a Bayesian estimator for a parametric model may select a model that only uses a subset of the possible states, particularly if you have a sparse Bayesian prior. Indeed, one way of estimating a non-parametric model is to fit a corresponding parametric model with a state space sufficiently large that not all states will be occupied (or only occupied with very low probability).

I think this makes it clear that non-parametric models aren't necessarily that different to parametric ones.

The great hope (and at this stage I think that's all it is) for non-parametric models is that it will let us formulate and explore models of greater complexity than we could deal with parametrically.

If you'll excuse me patting my own back, I think that the adaptor grammars we presented at NIPS last year are an example of something that would be hard to formulate parametrically. At a very high level, adaptor grammars are an extension of PCFGs that permit an infinite number of possible rules. The possible rules are combinations of other useful rules, and so on recursively. So adaptor grammars are a single framework that integrates the two phases of standard generate-and-prune grammar learning systems (in which a rule-proposal phase is followed by a rule-probability estimation phase that prunes the useless rules).

david blei said...

i agree with mark on both counts.

it's true that nonparametric bayesian models can often be approximated with finite parametric models. the Dirichlet process allows two approximations---one through a finite stick-breaking model and another through a symmetric Dirichlet.

still, even if the finite approximation of the nonparametric model does the trick, it's nice to know what is being approximated. (and this is particularly useful for setting and reasoning about hyperparameters.)

mark's second point is compelling. the promise of NPB models is in moving beyond simply choosing a number of components. NPB models that generate structures, like grammars or trees, allow us to posit complicated combinatorial objects as latent random variables and still hope to infer them from data.

the naysayer might say: this is simply search with an objective function that is the posterior. yes, this is true, at least when you only care about a MAP estimate. but, the NPB posterior gives a nicely regularized objective function trading off what the data imply and a prior preference for simpler (or more complicated) structures.

Anonymous said...

Ultima Online Gold, UO Gold, crestingwait
lotro gold
wow gold
warhammer gold
Age of Conan Gold, AOC Gold

. said...

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

cilemsin42 said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chat
sesli sohbet siteleri

sesli chat siteleri sesli sohbetsesli chat
sesli sohbet siteleri
sesli chat siteleri
SesliChat
cılgın sohbet
güzel kızlar
bekar kızlar
dul bayanlar
seviyeli insanlar
yarışma
canlı müzik
izdivac
en güzel evlilik
sesliparti
seslisohbet odalari
Sesli Chat
SesliChat Siteleri
Sesli Chat sitesi
SesliChat sitesi
SesliSohbet
Sesli Sohbet
Sesli Sohbet Sitesi
SesliSohbet Sitesi
SesliSohbet Siteleri
Muhabbet Sitesi
kamerali chat
Görüntülü Sohbet
Hasret gülleri
Çet sitesi
SesliSohbet
Sesli Sohbet
Canli sohbet
Turkce sohbet
Kurtce Sohbet
Kurtce Chat
Kurtce Muhabbet
Kurtce Sohbet
Kurdish Chat
SesliChat
Sesli Chat
SesliSanal
Guncel Haber
sohbet Sitesi
Chat sitesi..

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbet
seslisohbet
sesli chat
seslichat
sesli sohbet sitesi
sesli chat sitesi
sesli sohpet
kamerali sohbet
kamerali chat
webcam sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex