My colleague Suresh Venkatasubramanian is running as seminar on clustering this semester. Last week we discussed EM and mixture of Gaussians. I almost skipped because it's a relatively old hat topic for me (how many times have I given this lecture?!), and had some grant stuff going out that day. But I decided to show up anyway. I'm glad I did.

We discussed a lot of interesting things, but something that had been bugging me for a while finally materialized in a way about which I can be precise. I basically have two (purely qualitative) issues with mixture of Gaussians as a clustering method. (No, I'm not actually suggesting there's anything wrong with using it in practice.) My first complaint is that many times, MoG is used to get the cluster assignments, or to get soft-cluster assignments... but this has always struck me as a bit weird because then we should be maximizing over the cluster assignments and doing expectations over everything else. Max Welling has done some work related to this in the Bayesian setting. (I vaguely remember that someone else did basically the same thing at basically the same time, but can't remember any more who it was.)

But my more fundamental question is this. When we start dealing with MoG, we usually say something like... suppose we have a density F which can be represented at F = pi_0 F_0 + pi_1 F_1 + ... + pi_K F_K, where the pis give a convex combination of "simpler" densities F_k. This question arose in the context of density estimation (if my history is correct) and the maximum likelihood solution via expectation maximization was developed to solve the density estimation problem. That is, the ORIGINAL goal in this case was to do density estimation; the fact that "cluster assignments" were produced as a byproduct was perhaps not the original intent.

I can actually give a fairly simple example to try to make this point visually. Here is some data generated by a mixture of uniform distributions. And I'll even tell you that K=2 in this case. There are 20,000 points if I recall correctly:

Can you tell me what the distribution is? Can you give me the components? Can you give me cluster assignments?

The problem is that I've constructed this to be non-identifiable. Here are two ways of writing down the components. (I've drawn this in 2D, but only pay attention to the x dimension.) They give rise to exactly the same distribution. One is equally weighted components, one uniform on the range (-3,1) and one uniform on the range (-1,3). The other is to have to components, one with 2/3 weight on the range (-3,3) and one with 1/3 weight on the range (-1,1).

I could imagine some sort of maximum likelihood parameter estimation giving rise to either of these (EM is hard to get to work here because once a point is outside the bounds of a uniform, it has probability zero). They both correctly recover the distribution, but would give rise to totally different (and sort of weird) cluster assignments.

I want to quickly point out that this is a very different issue from the standard "non-identifiability in mixture models issue" that has to do with the fact that any permutation of cluster indices gives rise to the same model.

So I guess that all this falls under the category of "if you want X, go for X." If you want a clustering, go for a clustering -- don't go for density estimation and try to read off clusters as a by-product. (Of course, I don't entirely believe this, but I still think it's worth thinking about.)

## 02 March 2009

### Mixture models: clustering or density estimation

Posted by hal at 3/02/2009 09:13:00 PM

Labels: clustering, machine learning

Subscribe to:
Post Comments (Atom)

## 26 comments:

nice pic. maybe in another post you can expand on the EM vs ME issue

I usually also think that if you want X, go for it. But the question of identifiability is not existing (i mean solved) under classical hypothesis, in the gaussian case, right ?

Nice thing with gaussian mixtures is that the theoretical framework under which you fall makes things much easier than many clustering algorithms...

Here's the abstract from Dempster, Laird and Rubin (1977), in which they describe EM as producing maximum likelihood estimates in the face of missing data (where the missing data in the mixture case is the mixture assignment):

A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived.

Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.

I believe that the problem of non-identifiability is well-understood in the literature, e.g., On the exponential value of labeled samples by Castelli and Cover. Of course if the mixture is not identifiable, it is hard to do anything at all.

But I think that the following question is at the heart of the problem:

given a probability density p, what is a good clustering?

If we cannot answer this (at least under some conditions), there is little hope to get anything more than a heuristic from a finite sample.

misha b

On a related note -- I think we often forget that the EM theorem says that EM converges to a local max in likelihood, and says nothing about convergence of the parameters. If the parameters are nonidentifiable there's no reason to expect them to converge (I've seen this with multinomial mixture models).

Have you considered the relation between EM with Gaussian Mixtures and the original k-means clustering algorithm (isodata), where k-means can can be seen as a limiting form of EM-GM. Also, see an old paper by Mike Kearns on this topic, where they consider a variant where points are assigned probabilistically rather than either hard or soft.

Many years ago, I already know the silkroad online game, but at that time I do not play, Recently due to my friends introduce, I began to play the game, but I need the silkroad gold, my friend look me the first time to play so send me some sro gold, but I still need more silkroad online gold, so I want to buy.

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要到打工兼差、打工,兼差,或者八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店當上班小姐,水水們如果想要擁有打工工作、晚上兼差工作、兼差打工、假日兼職、兼職工作、酒店兼差、兼差、打工兼差、日領工作、晚上兼差工作、酒店工作、酒店上班、酒店打工、兼職、兼差、兼差工作、酒店上班等,想了解酒店相關工作和特種行業內容,想兼職工作日領、假日兼職、兼差打工、或晚班兼職想擁有快速賺錢又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚，無業績壓力，無脫秀無喝酒壓力，高層次會員制客源，工作輕鬆，可日領、現領。

一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已，對水水們的上班安全一點保障都沒有！艾葳酒店經紀公司的水水們上班時全程媽咪作陪，不需擔心！只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職、缺錢卡奴的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的夜間兼職工作,打工機會和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??

艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,

等相關服務 幫您快速的實現您的夢想~!!

there is little hope to get anything more than a heuristic from a finite sample.

Assignment | Coursework | Thesis

where they consider a variant where points are assigned probabilistically rather than either hard or soft.

Dissertation | Essay

Классный фильм на кинозоуне.

электронная почта без регистрации

электронная почта без регистрации

I agree with the issue with mixture of Gaussians as a clustering method. I like your ideas for the future.

jacksonville beach cosmetic dentist

one day i went shopping outside,and in an ed hardy store,I found some kinds of ed hardy i love most they are Your website is really good Thank you for the information

ed hardyed hardyed hardy clothinged hardy clothinged hardy shoesed hardy shoesdon ed hardydon ed hardyed hardy clothesed hardy clothesed hardy bagsed hardy bagsed hardy swimweared hardy swimweared hardy jeansed hardy jeansed hardy mensed hardy mens Thank you for the informationReally trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbetsesli chat

sesli sohbet siteleri

sesli chat siteleri sesli sohbetsesli chat

sesli sohbet siteleri

sesli chat siteleri

SesliChat

cılgın sohbet

güzel kızlar

bekar kızlar

dul bayanlar

seviyeli insanlar

yarışma

canlı müzik

izdivac

en güzel evlilik

hersey burada

sesliparti

seslisohbet odalari

Sesli adresi

Sesli Chat

SesliChat Siteleri

Sesli Chat sitesi

SesliChat sitesi

SesliSohbet

Sesli Sohbet

Sesli Sohbet Sitesi

SesliSohbet Sitesi

SesliSohbet Siteleri

Muhabbet Sitesi

kamerali chat

Görüntülü Sohbet

Hasret gülleri

Çet sitesi

SesliSohbet

Sesli Sohbet

Canli sohbet

Turkce sohbet

Kurtce Sohbet

Kurtce Chat

Kurtce Muhabbet

Kurtce Sohbet

Kurdish Chat

SesliChat

Sesli Chat

SesliSanal

Guncel Haber

sohbet Sitesi

Chat sitesi..

Topic is very good, Thank you very much for your information, nice job keep it up.

qualitative dissertation | qualitative dissertations | dissertation writing

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbet

seslisohbet

sesli chat

seslichat

sesli sohbet sitesi

sesli chat sitesi

sesli sohpet

kamerali sohbet

kamerali chat

webcam sohbetReally trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbet

seslisohbet

sesli chat

seslichat

sesli sohbet sitesi

sesli chat sitesi

sesli sohpet

kamerali sohbet

kamerali chat

webcam sohbet

Hi,

This is an excellent and informative work, this will really helpful for me in future. I like the way you start and then conclude your thoughts. I completely agree with the points you have shared here about estimating the density or clustering. Thanks for this information. I really appreciate your work, keep it up.

Essay Writing Service

HP dv9700 battery

HP F4809A Battery

HP nc8000 battery

HP nc8230 battery

HP pavilion zd8000 battery

HP f2024b battery

HP f4812a battery

HP Pavilion ZV5000 battery

HP Pavilion DV1000 battery

HP Pavilion ZD7000 Battery

HP Pavilion DV2000 battery

HP Pavilion DV4000 Battery

HP Pavilion dv6000 Battery

HP Pavilion DV9000 Battery

HP F4098A battery

HP pavilion zx6000 battery

HP omnibook xe4400 battery

HP omnibook xe4500 battery

HP omnibook xe3 battery

Notebook NX9110 battery

IBM 02K6821 battery

IBM 02K7054 battery

IBM 08K8195 battery

IBM 08K8218 battery

IBM 92P1089 battery

IBM Thinkpad 390 Series battery

IBM Thinkpad 390X battery

IBM ThinkPad Z61m Battery

IBM 02K7018 Battery

IBM thinkpad t41p battery

IBM THINKPAD T42 Battery

IBM ThinkPad R60 Battery

IBM ThinkPad T60 Battery

IBM ThinkPad T41 Battery

IBM ThinkPad T43 Battery

IBM ThinkPad X40 Battery

Thinkpad x24 battery

ThinkPad G41 battery

IBM thinkpad r52 battery

Thinkpad x22 battery

IBM thinkpad t42 battery

IBM thinkpad r51 battery

Thinkpad r50 battery

IBM thinkpad r32 battery

Thinkpad x41 battery

SONY VGP-BPS2 Battery

SONY VGP-BPS2C Battery

SONY VGP-BPS5 battery

SONY VGP-BPL2C battery

SONY VGP-BPS2A battery

SONY VGP-BPS2B battery

SONY PCGA-BP1N battery

SONY PCGA-BP2E battery

SONY PCGA-BP2NX battery

SONY PCGA-BP2S battery

SONY PCGA-BP2SA battery

SONY PCGA-BP2T battery

SONY PCGA-BP2V battery

SONY PCGA-BP4V battery

SONY PCGA-BP71 battery

SONY PCGA-BP71A battery

SONY VGP-BPL1 battery

SONY VGP-BPL2 battery

Awesome Tips!

I am glad to found such useful post. I really increased my knowledge after read your post which will be beneficial for me.

Thanks for valuable sharing!

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

kameralı sohbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

seslidunya

seslisehir

sesli sex

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

sesli muhabbet siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

görüntülü muhabbet siteleri

kameralı sohbet siteleri

kameralı chat siteleri

kameralı muhabbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

birsesver

birses

seslidunya

seslisehir

sesli sex

Post a Comment