24 April 2006

Unsupervised Learning: Why?

Unsupervised learning is very popular for NLP problems, including MT, summarization, IE, etc. I've done work in this area, as have many other people. Beginning about 6 months ago, I have been asking myself: should we ever really do unsupervised learning?

This question may seem silly, but there is a fairly compelling argument. The argument is that, regarless of whether we are in academia or industry, we will have to convince someone else that our system is doing a good job. In order to do this, we need evaluation data. Which means we need to annotate. But once we've annotated, at the very least we should do semi-supervised learning, not unsupervised learning.

The problem with this argument is that it presupposes the existence of an automatic evaluation metric.

My newly formed perspective is the following. We should only do unsupervised learning if we do not have a trustworthy automatic evaluation metric (i.e., a Type III metric). I cannot currently think of a compelling argument against this.

11 comments:

Kevin Duh said...

Suppose you run out of money after annotating the test data. Then you can only do unsupervised learning on the training data. :)

More seriously, I think if unsupervised learning is to be used, it should demonstrate that it outperforms supervised learning with a practical amount of labeled data. "Practical amount" is a fuzzy term and depends on your resources. If you're a grad student, annotating 50 sentences with POS tags is practical; annotating the entire Wall Street Journal with syntax trees is not practical.

hal said...

Cross-validation?

I agree about "practical amount" though.

hal said...

I guess this runs in to the difficulty that you need data to develop features on. Doing this requires held-out data, which means cross validation really wouldn't work (at least if you don't want to cheat).

So I guess it also depends if you're developing a new system or just trying to port an existing one.

Anonymous said...

For NLP, at least we can test the performance an unsupervised learning in one language. If good, we can apply to the other languages to reduce the anotation labor.

Anonymous said...

This is based on the assumption that, anyway, supervised learning is "better"... unsupervised learning being an alternative for those who cannot afford an annotated corpus.
Well, I'm not sure of that. Sup and unsup learning are very different. IMO, unsupervised learning really IS learning, while sup learning is more like a mapping process between two structures that must be quite similar.
To me, sup and unsup learning are not opposed, they complete eachother. Unsup learning builds concepts from raw data (this step is often replaced by the use of 'features'), THEN, sup learning builds an 'interface' between this set of concept, and another one (human, machine..).

hal said...

I think I can agree with anonymous: I was focusing on cases where the problem really looks supervised, but we just don't have data so we build an unsupervised model for it.

However, playing devil's advocate: can someone give me an NLP problem that really is unsupervised by nature?

hal said...

Deepak --

I don't think you and I disagree. (There's a small caveat that I don't really know how you distinguish a system from a problem. I can imagine having a system B and then I want to decide if I should try to improve on it? I don't know how I can decide that this is an interesting problem to work on without actually trying it.)

I'm actually not sure what you're disagreeing with. I'm essentially saying that so long as we can evaluate automatically, we should do a supervised approach. Only if we cannot evaluate automatically (eg., for word alignment), should we do something unsupervised. Why? For exactly Alex's reason: if we don't know that the metric we're optimizing is good, then we might as well optimize anything and then show via human evaluations (or in Alex's case, Bleu evaluations) that we've done something useful.

I think the answer to your counter statement is just that someone has gone through the effort of showing that improvements in the white-box often (or are expected to) lead to improvements in the black-box. I completely sympathize with this. Unfortunately, this doesn't happen all that much :).

hal said...

Isn't this too permissive? I can almost always build two baselines with noticibly different performance, or a baseline that's noticibly worse than a human. And it seems odd to want a noticible difference between two humans: usually we're fighting to get our humans to agree!

hal said...

This seems to put the cart before the horse: you're saying that a problem is worth working on if, after I develop lots of solutions, I can see a difference. Isn't the point of deciding if a problem is worth working on or not so that you needn't waste time building all these systems? I also agree (to a certain extent -- this is a good other post) that introducing artifical constraints to make humans agree is bad.

I'm still not sure what in my original statement you disagree with, though. As far as I can tell, we're in agreement about when one should use unsupervised learning: essentially, whenever we are forced to do human evaluations.

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

gamefan12 said...

This is such a great resource. You are very smart about this. I just love your insight on this.
palm beach porcelain veneers