Comments on natural language processing blog: I Don't Think We Understand Structured Prediction

I want to know about the work done on Active Learn...

2007-09-13T09:00:00.000-06:00

I want to know about the work done on Active Learning for structure prediction.

You're probably right. I've thought about it a bi...

2007-02-18T15:07:00.000-07:00

You're probably right. I've thought about it a bit from this perspective too. It seems to me that there are basically two things that matter: (1) what structure does the loss function imply; (2) what structure do the features imply (or, what aspects of the structure do we believe are useful).

Eg., POS tagging... loss function (typically Hamming loss) implies nothing about the structure. But our knowledge of language says there are at least local dependencies, and probably larger syntactic dependencies. The former are easier/more-tractable to deal with, so we do a Markov model.

Eg2., machine translation... loss function (eg., Bleu) implies that we had better focus on getting 4-grams correct. Our knowledge says that things like ngram language models are useful, so we get Markov dependencies, and syntactic information is probably useful (so we get syntax). Luckily the Markov dependencies overlap with the 4-grams from Bleu, so we only have two issues to contend with.

It seems that the tractable/intractable model issue is really one of: (A) does our loss function lead to a tractable structure and (B) do our features? My sense --- and there's growing empirical support for this --- is that an impoverished feature set in a tractable model is almost always worse than a rich feature set in an intractable model.

You posed the question of whether there is a good ...

2007-02-15T00:33:00.000-07:00

You posed the question of whether there is a good ontology/classification for structured prediction techniques. I'd like to ask a different question: are there different classes of structured prediction problems? For instance, is predicting a sequence fundamentally different from predicting a tree?

For the "tractable" techniques like M3N and CRF, it is considerably more difficult to directly extend models build for sequence prediction problems into tree prediction. For other techniques like perceptron, all that is required is that some argmax is defined for the structure to be predicted (Correct me if I'm wrong.)

This leads me to wonder, are there different classes of structured prediction problems, which will do well with different classes of structured prediction techniques?