My colleague Suresh Venkatasubramanian is running as seminar on clustering this semester. Last week we discussed EM and mixture of Gaussians. I almost skipped because it's a relatively old hat topic for me (how many times have I given this lecture?!), and had some grant stuff going out that day. But I decided to show up anyway. I'm glad I did.

We discussed a lot of interesting things, but something that had been bugging me for a while finally materialized in a way about which I can be precise. I basically have two (purely qualitative) issues with mixture of Gaussians as a clustering method. (No, I'm not actually suggesting there's anything wrong with using it in practice.) My first complaint is that many times, MoG is used to get the cluster assignments, or to get soft-cluster assignments... but this has always struck me as a bit weird because then we should be maximizing over the cluster assignments and doing expectations over everything else. Max Welling has done some work related to this in the Bayesian setting. (I vaguely remember that someone else did basically the same thing at basically the same time, but can't remember any more who it was.)

But my more fundamental question is this. When we start dealing with MoG, we usually say something like... suppose we have a density F which can be represented at F = pi_0 F_0 + pi_1 F_1 + ... + pi_K F_K, where the pis give a convex combination of "simpler" densities F_k. This question arose in the context of density estimation (if my history is correct) and the maximum likelihood solution via expectation maximization was developed to solve the density estimation problem. That is, the ORIGINAL goal in this case was to do density estimation; the fact that "cluster assignments" were produced as a byproduct was perhaps not the original intent.

I can actually give a fairly simple example to try to make this point visually. Here is some data generated by a mixture of uniform distributions. And I'll even tell you that K=2 in this case. There are 20,000 points if I recall correctly:

Can you tell me what the distribution is? Can you give me the components? Can you give me cluster assignments?

The problem is that I've constructed this to be non-identifiable. Here are two ways of writing down the components. (I've drawn this in 2D, but only pay attention to the x dimension.) They give rise to exactly the same distribution. One is equally weighted components, one uniform on the range (-3,1) and one uniform on the range (-1,3). The other is to have to components, one with 2/3 weight on the range (-3,3) and one with 1/3 weight on the range (-1,1).

I could imagine some sort of maximum likelihood parameter estimation giving rise to either of these (EM is hard to get to work here because once a point is outside the bounds of a uniform, it has probability zero). They both correctly recover the distribution, but would give rise to totally different (and sort of weird) cluster assignments.

I want to quickly point out that this is a very different issue from the standard "non-identifiability in mixture models issue" that has to do with the fact that any permutation of cluster indices gives rise to the same model.

So I guess that all this falls under the category of "if you want X, go for X." If you want a clustering, go for a clustering -- don't go for density estimation and try to read off clusters as a by-product. (Of course, I don't entirely believe this, but I still think it's worth thinking about.)

## 02 March 2009

### Mixture models: clustering or density estimation

Posted by hal at 3/02/2009 09:13:00 PM

Labels: clustering, machine learning

Subscribe to:
Post Comments (Atom)

## 14 comments:

nice pic. maybe in another post you can expand on the EM vs ME issue

I usually also think that if you want X, go for it. But the question of identifiability is not existing (i mean solved) under classical hypothesis, in the gaussian case, right ?

Nice thing with gaussian mixtures is that the theoretical framework under which you fall makes things much easier than many clustering algorithms...

Here's the abstract from Dempster, Laird and Rubin (1977), in which they describe EM as producing maximum likelihood estimates in the face of missing data (where the missing data in the mixture case is the mixture assignment):

A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived.

Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.

I believe that the problem of non-identifiability is well-understood in the literature, e.g., On the exponential value of labeled samples by Castelli and Cover. Of course if the mixture is not identifiable, it is hard to do anything at all.

But I think that the following question is at the heart of the problem:

given a probability density p, what is a good clustering?

If we cannot answer this (at least under some conditions), there is little hope to get anything more than a heuristic from a finite sample.

misha b

On a related note -- I think we often forget that the EM theorem says that EM converges to a local max in likelihood, and says nothing about convergence of the parameters. If the parameters are nonidentifiable there's no reason to expect them to converge (I've seen this with multinomial mixture models).

Have you considered the relation between EM with Gaussian Mixtures and the original k-means clustering algorithm (isodata), where k-means can can be seen as a limiting form of EM-GM. Also, see an old paper by Mike Kearns on this topic, where they consider a variant where points are assigned probabilistically rather than either hard or soft.

there is little hope to get anything more than a heuristic from a finite sample.

Assignment | Coursework | Thesis

where they consider a variant where points are assigned probabilistically rather than either hard or soft.

Dissertation | Essay

I agree with the issue with mixture of Gaussians as a clustering method. I like your ideas for the future.

jacksonville beach cosmetic dentist

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbet

seslisohbet

sesli chat

seslichat

sesli sohbet sitesi

sesli chat sitesi

sesli sohpet

kamerali sohbet

kamerali chat

webcam sohbetReally trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbet

seslisohbet

sesli chat

seslichat

sesli sohbet sitesi

sesli chat sitesi

sesli sohpet

kamerali sohbet

kamerali chat

webcam sohbet

Hi,

This is an excellent and informative work, this will really helpful for me in future. I like the way you start and then conclude your thoughts. I completely agree with the points you have shared here about estimating the density or clustering. Thanks for this information. I really appreciate your work, keep it up.

Essay Writing Service

IBM ThinkPad R60 Battery

IBM ThinkPad T60 Battery

IBM ThinkPad T41 Battery

IBM ThinkPad T43 Battery

IBM ThinkPad X40 Battery

Thinkpad x24 battery

ThinkPad G41 battery

IBM thinkpad r52 battery

Thinkpad x22 battery

IBM thinkpad t42 battery

IBM thinkpad r51 battery

Thinkpad r50 battery

IBM thinkpad r32 battery

Thinkpad x41 battery

SONY VGP-BPS2 Battery

SONY VGP-BPS2C Battery

SONY VGP-BPS5 battery

SONY VGP-BPL2C battery

SONY VGP-BPS2A battery

SONY VGP-BPS2B battery

SONY PCGA-BP1N battery

SONY PCGA-BP2E battery

SONY PCGA-BP2NX battery

SONY PCGA-BP2S battery

SONY PCGA-BP2SA battery

SONY PCGA-BP2T battery

SONY PCGA-BP2V battery

SONY PCGA-BP4V battery

SONY PCGA-BP71 battery

SONY PCGA-BP71A battery

SONY VGP-BPL1 battery

SONY VGP-BPL2 battery

Awesome Tips!

I am glad to found such useful post. I really increased my knowledge after read your post which will be beneficial for me.

Thanks for valuable sharing!

Post a Comment