29 January 2006

Predictive Models

There are at least two reasonable goals for NLP research: (1) produce predictive systems that mimic humans; (2) produce explanatory models. (2) gets more into CL and CogSci and I may discuss it at a later date. This post is about (1), at least the machine learning approach to (1).

Our goal is to mimic a human on some task. This means we will, at the end of the day, want to measure our performance against human performance. This measurement is our loss function, and is, in my mind, the only thing that really matters. Coming up with a good loss function is really really hard and I will certainly discuss this issue in another post. But suppose we can come up with one: l(t,h) = how much it hurts to produce a hypothesis h when the true answer was t.

The subsequent question is: given some collected data, how can I minimize l. The dominant approach seems to be: approximate l by some well-known loss (binary loss, squared error, absolute loss, etc.) and then use one of the many good classifiers/regressors to solve our original problem.

This step irks me greatly. Not because this approximation is always bad, but because it is never (rarely) stated. Many papers say something like "Our task in XYZ is to minimize misclassification error of problems created by ABC." This is rarely actually true. I would be much happier with the following explanation: "Our task in XYZ is to minimize TUV loss, but we don't know how to do that, so we'll instead minimize misclassification error." This issue manifests itself commonly in another way: putting the cart before the horse. We often gather a data set, label it, solve the labeling problem (eg., by multiclass classification) and then go back and define the problem in such a way that this is an appropriate solution. This is harder to detect, but equally disingenuous.

So let's say we've done things right: we've defined our loss function and we've gotten some labeled data. Now we have to minimize our loss on this data. If we can do this directly, great. If not, we can say so and that we'll approximate it by some other loss l'.

Now we can start looking at features. We have an l' that we can minimize and we want to come up with features that are useful for this minimization. In particular, if ours is a structured problem, and l' is invariant to some change in structure, we needn't reflect this aspect of the structure in our feature functions function (the prototypical example here being Markov features in sequence labeling for Hamming [per-label] loss, though this issue itself could take a whole other post). We should set aside part of the training data for manual inspection to do feature engineering.

Experimentation is the last real step. We've split off a training set, development set and test set (*). We train a model on the training set, evaluate it on the development set. We look at the dev predictions and see what went wrong. We reengineer some features to fix these problems and repeat. Only at the end do we run on the test set and report our results in a paper.

(*) I'm not a fan of cross-validation in such a cyclic process. The features will be engineered based on the training and development data sets, and to then fold all this back into the bag and do cross-validation will give us unrealistically good results.

And as a very final step, in the case that our optimized l' is not the same as l, I would like to see some results that show that they at least correlate reasonably well. And easy way to do this is to run several versions of the model and to report both l and l' scores and let us see that improving l' does indeed improve l.

7 comments:

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Adi said...

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

tar said...

Мультфильмы онлайн

qishaya said...

one day i went shopping outside,and in an ed hardy store,I found some kinds of ed hardy i love most they are Your website is really good Thank you for the information ed hardy ed hardy ed hardy clothing ed hardy clothing ed hardy shoes ed hardy shoes don ed hardy don ed hardy ed hardy clothes ed hardy clothes ed hardy bags ed hardy bags ed hardy swimwear ed hardy swimwear ed hardy jeans ed hardy jeans ed hardy mens ed hardy mens Thank you for the information

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex