13 February 2007

I Don't Think We Understand Structured Prediction

I've posted on structured prediction a few times before and then some. This is another post, but with a different perspective. Besides, the most recent time was last May, so it's been a while. This post is prompted by a recent OpEd by John Lafferty and Larry Wasserman on Challenges in Machine Learning in Statistica Sinica, where they mention SP as one of three major open questions in machine learning (the other two are semi-sup and sparse learning in high dimensions).

I've been trying to think of a good way of ontologizing SP methods recently, primarily because I want to understand them better. The more I think about it, the more I feel like we really don't understand the important issues. This is partially evidenced by an observation that in the past six years (essentially, since the first CRF paper), there have been like 30 different algorithms proposed for this problem. The fact that there are so many proposed algorithms tells me that we don't really understand what's going on, otherwise we wouldn't need to keep coming up with new techniques. (I'm not innocent.)

To me, the strongest division between techniques is between techniques that require tractable inference in the underlying problem (eg, CRFs, M3Ns, SVM-struct, structured perceptron, MIRA, MEMMs, HMMs, etc.) and those that don't (eg, incremental perceptron, LaSO, Searn, the Liang/Lacoste-Julien/Klein/Taskar MT algorithm, local predictors ala Roth and colleagues, etc.). Whether this truly is the most important distinction is unclear, but to me it is. I think of the former set as a sort of "top down" or "technology-push" techniques, while the latter are a sort of "bottom up" or "application-pull" techniques. Both are entirely reasonable and good ways of going at looking at the problem, but as of now the connections between the two types are only weakly understood.

An alternative division is between generative techniques (HMMs), conditional methods (CRFs) and margin-based techniques (everything else, minus Searn which is sort of non-committal with respect to this issue). I don't really think this is an important distinction, because with the exception of generative being quite different, conditional methods and margin-based methods are essentially the same in my mind. (Yes, I understand there are important differences, but it seems that in the grand scheme of things, this is not such a relevant distinctions.)

A related issue is whether the algorithms admit a kernelized version. Pretty much all do, and even if not, I also don't see this as a very important issue.

There are other issues, some of which are mentioned in the Lafferty/Wasserman paper. One is consistency. (For those unfamiliar with the term, a consistent algorithm is essentially one that is guaranteed to find the optimal solution if given infinite amounts of data.) CRFs are consistent. M3Ns are not. Searn is not, even if you use a consistent classifier as the base learning algorithm. My sense is that to statisticians, things like consistency matter a lot. In practice, my opinion is that they're less important because we never have that much data.

Even within the context of techniques that don't require tractability, there is a great deal of variation. To my knowledge, this family of techniques was essentially spawned by Collins and Roark with the incremental perceptron. My LaSO thing was basically just a minor tweak on the IP. Searn is rather different, but other published methods are largely other minor variants on the IP and/or LaSO, depending on where you measure from. And I truly mean minor tweaks: the essentially difference between LaSO and IP is whether you restart your search at the truth when you err. Doing restarts tends to help, and it lets you prove a theorem. We later had a NIPS workshop paper that restarted, but allowed the algorithm to take a few steps off the right path before doing so. This helped experimentally and also admitted another theorem. I've seen other work that does essentially the same thing, but tweaks how updates are done exactly and/or how restarts happen. The fact that we're essentially trying all permutations tells me that many things are reasonable, and we're currently in the "gather data" part of trying to figure out the best way to do things.

8 comments:

Kevin said...

You posed the question of whether there is a good ontology/classification for structured prediction techniques. I'd like to ask a different question: are there different classes of structured prediction problems? For instance, is predicting a sequence fundamentally different from predicting a tree?

For the "tractable" techniques like M3N and CRF, it is considerably more difficult to directly extend models build for sequence prediction problems into tree prediction. For other techniques like perceptron, all that is required is that some argmax is defined for the structure to be predicted (Correct me if I'm wrong.)

This leads me to wonder, are there different classes of structured prediction problems, which will do well with different classes of structured prediction techniques?

hal said...

You're probably right. I've thought about it a bit from this perspective too. It seems to me that there are basically two things that matter: (1) what structure does the loss function imply; (2) what structure do the features imply (or, what aspects of the structure do we believe are useful).

Eg., POS tagging... loss function (typically Hamming loss) implies nothing about the structure. But our knowledge of language says there are at least local dependencies, and probably larger syntactic dependencies. The former are easier/more-tractable to deal with, so we do a Markov model.

Eg2., machine translation... loss function (eg., Bleu) implies that we had better focus on getting 4-grams correct. Our knowledge says that things like ngram language models are useful, so we get Markov dependencies, and syntactic information is probably useful (so we get syntax). Luckily the Markov dependencies overlap with the 4-grams from Bleu, so we only have two issues to contend with.

It seems that the tractable/intractable model issue is really one of: (A) does our loss function lead to a tractable structure and (B) do our features? My sense --- and there's growing empirical support for this --- is that an impoverished feature set in a tractable model is almost always worse than a rich feature set in an intractable model.

sachin said...

I want to know about the work done on Active Learning for structure prediction.

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex