There are at least two reasonable goals for NLP research: (1) produce predictive systems that mimic humans; (2) produce explanatory models. (2) gets more into CL and CogSci and I may discuss it at a later date. This post is about (1), at least the machine learning approach to (1).

Our goal is to mimic a human on some task. This means we will, at the end of the day, want to *measure* our performance against human performance. This measurement is our *loss function*, and is, in my mind, the *only* thing that really matters. Coming up with a good loss function is **really really hard** and I will certainly discuss this issue in another post. But suppose we *can* come up with one: *l(t,h) =* how much it hurts to produce a hypothesis *h* when the true answer was *t*.

The subsequent question is: given some collected data, how can I minimize *l*. The dominant approach seems to be: approximate *l* by some well-known loss (binary loss, squared error, absolute loss, etc.) and then use one of the many good classifiers/regressors to solve our original problem.

This step irks me greatly. Not because this approximation is always bad, but because it is never (rarely) stated. Many papers say something like "Our task in XYZ is to minimize misclassification error of problems created by ABC." This is rarely actually true. I would be much happier with the following explanation: "Our task in XYZ is to minimize TUV loss, but we don't know how to do that, so we'll instead minimize misclassification error." This issue manifests itself commonly in another way: putting the cart before the horse. We often gather a data set, label it, solve the labeling problem (eg., by multiclass classification) and *then* go back and define the problem in such a way that this is an appropriate solution. This is harder to detect, but equally disingenuous.

So let's say we've done things right: we've defined our loss function and we've gotten some labeled data. Now we have to minimize our loss on this data. If we can do this directly, great. If not, we can say so and that we'll approximate it by some other loss *l'*.

Now we can start looking at features. We have an *l'* that we **can** minimize and we want to come up with features that are useful for this minimization. In particular, if ours is a structured problem, and *l'* is invariant to some change in structure, we needn't reflect this aspect of the structure in our feature functions function (the prototypical example here being Markov features in sequence labeling for Hamming [per-label] loss, though this issue itself could take a whole other post). We should set aside part of the training data for manual inspection to do feature engineering.

Experimentation is the last real step. We've split off a training set, development set and test set (*). We train a model on the training set, evaluate it on the development set. We *look* at the dev predictions and see what went wrong. We reengineer some features to fix these problems and repeat. Only at the end do we run on the test set and report our results in a paper.

(*) I'm not a fan of cross-validation in such a cyclic process. The features will be engineered based on the training and development data sets, and to then fold all this back into the bag and do cross-validation will give us unrealistically good results.

And as a very final step, in the case that our optimized *l'* is not the same as *l*, I would like to see some results that show that they at least correlate reasonably well. And easy way to do this is to run several versions of the model and to report both *l* and *l'* scores and let us see that improving *l'* does indeed improve *l*.

“Retire Statistical Significance”: The discussion.

11 hours ago

## 7 comments:

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

Мультфильмы онлайн

one day i went shopping outside,and in an ed hardy store,I found some kinds of ed hardy i love most they are Your website is really good Thank you for the information

ed hardyed hardyed hardy clothinged hardy clothinged hardy shoesed hardy shoesdon ed hardydon ed hardyed hardy clothesed hardy clothesed hardy bagsed hardy bagsed hardy swimweared hardy swimweared hardy jeansed hardy jeansed hardy mensed hardy mens Thank you for the informationReally trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

kameralı sohbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

seslidunya

seslisehir

sesli sex

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

sesli muhabbet siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

görüntülü muhabbet siteleri

kameralı sohbet siteleri

kameralı chat siteleri

kameralı muhabbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

birsesver

birses

seslidunya

seslisehir

sesli sex

Post a Comment