01 August 2007

Explanatory Models

I am frequently asked the question: why does your application for solving XXX make such and such an error? (You can easily replace "your" with any possessive noun and the statement remains valid.)

My standard answer is to shrug and say "who knows."

This is quite different from, for instance, work in pattern matching for information extraction (many other citations are possible). In this setting, when the system makes an error, one can ask the system "what pattern caused this error." You can then trace the pattern back to the source documents from which it came and obtain some understanding for what is going on.

This is frequently not the case for your generic sequence labeling algorithm. If, say, a CRF misses a person name, what can you do about it? Can you understand why it made the error. More generally, if a model of any variety errs, can it say anything about why this error came to be.

One way to approach this problem would be to try to inspect the weights of the learned algorithm, but there are so many trade-offs going on internally that I've never been able to do this successfully (by hand, at least --- perhaps a clever tool could help, but I'm not sure). An alternative that I've been thinking about recently (but probably won't work on because I haven't enough time right now) is instead to pose the question as: what is the minimal change to the input required so that I would have made the decision correctly.

I guess one way to think about this is consider the case that a POS system misses tagging "Fred" in the sentence "Fred is not happy" as an NNP and instead calls it a "VBD." Presumably we have a bunch of window features about Fred that give us its identify, prefixes and suffixes, etc. Perhaps if "Fred" had been "Harry" this wouldn't have happened because "Fred" has the spelling feature "-ed." (Ok, clearly this is a very crafted example, but you get the idea.)

The question is: how do you define minimal change in features. If we're in an HMM (where we're willing to assume feature independence), then I don't think this is a big problem. But in the case where "Fred" ends in "-ed", it also ends in "-d" and both of these make it look more like a VBD. Such an explanatory system would ideally like to know that if "-d" weren't there, then neither would be "-d" and use this for computing the minimal change. It would also have to know that certain features are easier to change than others. For instance, if it has only ever seen the word "Xavier" in the training data as an NNP, then it could also suggest that if the word were "Xavier" instead of "Fred" than this error would not have happened. But this is sort of silly, because it gives us no information. (I'm working under the assumption that we want to do this so that we can add/remove features to our model to help correct for errors [on development data :P].)

It seems like neither of these problems is insurmountable. Indeed, just looking at something like feature frequency across the entire training set would give you some sense of which features are easy to change, as well as which ones are highly correlated. (I guess you could even do this on unlabeled data.)

I feel like it's possible to create a methodology for doing this for a specific problem (eg., NE identification), but I'd really like to see some work on a more generic framework that can be applied to a whole host of problems (why did my MT system make that translation?). Perhaps something already exists and I just haven't seen it.

12 comments:

Mark Dredze said...

I've heard a few conversations on this topics, especially when learning is discussed in the context of UIs. One of the reasons that UI people sometimes prefer pattern based or rule based methods is because it is much more intuitive to the user when the system makes a mistake (and how it can be fixed). Users don't think in terms of probabilities and features, so its hard to image a way of exposing that information to a user in a useful way. Even the example you gave, the "ed" suffix, is not necessarily intuitive to the average user. They'd have to realize that this is a common verb suffix, but I think that realization would only come from working directly with these problems and thinking about features in the way we do.

Another possible solution is to use a nearest neighbor paradigm to convey this information to the user. Instead of showing a feature, the system could show a training example that most influenced this mistaken decision (the closest instance to this instance with the same guessed label). That would be equivalent to saying, "I thought this was true because I had seen this example." Statistical learning is based on the principle of learning from lots of examples, so its only natural to show the user which one of those examples caused the problem. However, either approach seems quite tricky and its likely that sometimes it would still be uninformative to the user.

While this isn't something we often consider in the learning community, I have heard it a lot from the HCI crowd. I think it is a major reason why statistical learning methods may appear in backend systems but often don't show up in user applications.

Peter said...

SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking (Yoav Goldberg; Michael Elhadad) covers some of this from a backend point of view.

Yoav said...

Thanks for the mention, Peter.

Personally, I feel that (from the model designer, not the end user, perspective) the question of interest is not why the model made a specific mistake, but
why it got the (difficult) right things right.

For example, in the "Fred as a verb" case, knowing that the model erred because of the -ed feature is a nice anecdote,
but there is not much you can do about it -- you probably wouldn't go and remove the suffix feature from your model (because you believe it is a good one), but
go and look for other features which might help solving that specific case. But you could mostly do that just as well without understanding
that the error was casued by the -ed feature.

On the other hand, knowing why your model got several hard-to-get cases right will give you an insight on
(a) which of your features are really usefull -- it might be that the '-ed' feature never actually helps anything, in which
case you can remove it from your models and (b) how well is the model expected to behave on slightly different data.
I am not talking about the majority of correct decisions which are easy to do and to explain, but on those "top 1%" or so correct decisions
that the model got right only after some serious twicking and feature engineering -- kowing if these result from an overfitting of the data or are a real improvement is very interesting.
As an example, the Charniak and Johnson rereanking parser does surprisingly well on coordination -- does it really learn coordination? Or is it overfitting some economic jargon or topical conventions? Knowing which reranker features really take place
in resolving the coordination cases is (in my opinion) an extremely interesting question.

mark johnson said...

Yoav,

In designing the Charniak-Johnson parser, I did several ablation studies, where I removed one feature class (a group of related features) to measure their effect on overall performance (summaries of these are presented here.)

But I agree with the general comments; these ablation studies don't really let us understand why the model works (or doesn't).

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

Adi said...

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

gamefan12 said...

These errors are so common with me. I run into this all the time. I would love to see a fix for these.
ponte vedra beach cosmetic dentist

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

cilemsin42 said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chat
sesli sohbet siteleri

sesli chat siteleri sesli sohbetsesli chat
sesli sohbet siteleri
sesli chat siteleri
SesliChat
cılgın sohbet
güzel kızlar
bekar kızlar
dul bayanlar
seviyeli insanlar
yarışma
canlı müzik
izdivac
en güzel evlilik
hersey burada
sesliparti
seslisohbet odalari
Sesli adresi
Sesli Chat
SesliChat Siteleri
Sesli Chat sitesi
SesliChat sitesi
SesliSohbet
Sesli Sohbet
Sesli Sohbet Sitesi
SesliSohbet Sitesi
SesliSohbet Siteleri
Muhabbet Sitesi
kamerali chat
Görüntülü Sohbet
Hasret gülleri
Çet sitesi
SesliSohbet
Sesli Sohbet
Canli sohbet
Turkce sohbet
Kurtce Sohbet
Kurtce Chat
Kurtce Muhabbet
Kurtce Sohbet
Kurdish Chat
SesliChat
Sesli Chat
SesliSanal
Guncel Haber
sohbet Sitesi
Chat sitesi..

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex