This brings up the immediate question, though: instead of doing inference in a non-parametric model, why don't you just do model selection (eg by comparing marginals) or model averaging. You can just vary whatever it is that is the "non-parametric" part of the model. For instance, in a DP, you run a bunch of inferences with different numbers of clusters and either choose the best (model selection) or average with respect to the marginals (model averaging). In something like an IBP, you can run with a different number of latent factors and select or average.

I've been asked this general question a few times by non-ML people and I rarely feel like I can give a compelling answer. In particular, I'm not aware of any non-toy experimental comparisons between doing model selection/averaging in any of these models. And even toy ones are hard to come by. But even beyond empirical evidence, I often have a hard time even formulating a coherent qualitative argument.

Here are some points I've come up with, but maybe commentors can either debunk them or add...

- In some cases, there are lots of parts of the model for which we don't know the structure, so to do model selection/averaging would require trying a ridiculously large number of models. For instance, I might have two components in my model that are DP-ish, so now I have to try quadratically many models.
- I may not know a good upper/lower bound on the number of components (eg., in a DP). So I'm going to have to try a really large range. In fact, although it's well known that the expected number of clusters in a DP grows as o(log N), where N is the number of data points, it is actually unbounded (and there's a conjecture that it's w(log log N), which isn't terribly slow).
- Comparing marginal likelihoods across models with a different number of parameters is just plain hard. In fact, for most cases, I don't know how to do it, especially if you want to live in MCMC world. (In variational world you could compare the lower bound on the marginals, but it's always a bit nerve wracking to compare two lower bounds -- you'd rather compare lowers and uppers.) I'm aware of things like reversible jump MCMC and so on, but in most cases these aren't actually applicable to the models you want. Alternatively, if all you want to do is select (not average), you could always do something with held-out data.

But I can go back yet again. To counter the counter to (1), you can argue that the sampler is at least guaranteed after a long time to hit what you care about, whereas if you construct some arbitrary search policy, you may not be. For (2), well...I don't know...I'm pretty convinced by the counter-argument to (2) :P... For (3), you could just disagree and say: why should we develop better MCMC techniques for comparing marginals when we can get away from this whole business by doing non-parametric inference.

Overall, I think non-parametric inference is interesting, useful and fun. But I'd like better arguments against the nay-sayers (who, in my experience, are actually typically non-ML people).

(Note that I'm ignoring the case where the non-parametric model is actually known--or roughly known--to be the right model for your problem. Of course if it's the right model, then you should use it. I'm more referring to the question of using non-parametric methods to get around model selection issues.)