I spent the second day of workshops at NIPS (while not skiing) attending the Learning when test and train inputs have different distributions workshop. This is closely related (or really just a different name) for the domain adaptation problem I've been interested in for quite some time. Unfortunately, I can't easily come across the list of papers (there were some good ones!) which means that my memory may be lacking at some parts. Here are some points I took away.

Statisticians have worked on this problem for a long time. If you provide insurance, you're going to want a predictor to say whether to give a new person a policy or not. You have lots of data on people and whether they made any claims. Unfortunately, the training data (people you have information on) is limited to those to whom you gave policies. So the test distribution (entire popular) differs from the training distribution (people to whom you gave policies). Some guy (can't remember his name right now) actually won a Nobel prize in Economics for a partial solution to this problem, which was termed "covariate shift" (because statisticians call our "inputs" "covariates" and they are changing).

There seems to be a strong desire to specify models which (though see the comment below) can be characterized as "train p(y|x) and test p(y|x) are the same, but train p(x) and test p(x) differ." In other words, the "labeling function" is the same, but the distribution over inputs is different. This is probably the clearest way to differentiate domain adaptation from multitask learning (for the latter, we typically assume p(x) stays the same but p(y|x) changes). I'm not sure that this is really a hugely important distinction. It may be to obtain interesting theoretical results, but my sense is that in the problems I encounter, both p(x) and p(y|x) are changing, but hopefully not by "too much." An interesting point made along these lines by Shai Ben David that I spent a bunch of time thinking about several years ago was that from a theoretical perspective, assuming p(y|x) is the same is a vacuous assumption, because you can always take two radically different p(x) and q(x), add a feature that indicates which (p vs. q) the data point came from, and call this the "global p(x)". In fact, in some sense, this is all you need to do to solve multitask learning, or domain adaptation: just add a feature saying which distribution the input is from, and learn a single model. I've been doing some experiments recently and, while you can do better than this in practice with standard learning models, it's not such a bad approach.

There were several other talks I liked. It seemed that the results (theoretically) were of the form "if p(y|x) is the same and p(x) and p(y) differ only by a 'little' then doing naive things for learning can do nicely." My favorite formalization of p(x) and p(y) differ a little was the Shai Ben David/John Blitzer approach of saying that they differ slightly if there is a single hyperplane that does well (has low error) on both problems. The restriction to hyperplanes is convenient for what they do later, but in general it seems that having a single hypothesis from some class that will do well on both problems is the general sense of what "p(y|x) is the same" is really supposed to mean. I also enjoyed a talk by Alex Smola on essentially learning the differences between the input distributions and using this to your advantage.

In general, my sense was that people are really starting to understand this problem theoretically, but I really didn't see any *practical* results that convinced me at all. Most practical results (modulo the Ben David/Blitzer, which essentially cites John's old work) were very NIPSish, in the sense that they were on unrealistic datasets (sorry, but it's true). I wholeheartedly acknowledge that its somewhat difficult to get your hands on good data for this problem, but there *is* data out there. And it's plentiful enough that it should no longer be necessary to make artificial or semi-artificial data for this problem. (After all, if there weren't real data out there, we wouldn't be working on this problem...or at least we shouldn't :P.)

## 02 January 2007

### Learning when test and train inputs have different distributions -- NIPS workshop

Posted by hal at 1/02/2007 09:31:00 PM

Labels: conferences, domain adaptation, machine learning

Subscribe to:
Post Comments (Atom)

## 10 comments:

Reading between the lines, your post suggests to me that you may actually be interested in context-shift, rather than distribution-shift. Here are some relevant links:

Workshop on Learning in Context-Sensitive Domains

Context-Sensitive Learning Bibliography

The management of context-sensitive features: A review of strategies

The identification of context-sensitive features: A formal definition of context for concept learning

These links are a bit old. There may be more recent work, but I haven't kept up with the field.

Hello,

your blog is quite interesting.

You should display your delicious badge on in.

Nicolas

Just some fact checking to help out. The Nobel prize winner in question is James Heckman from U Chicago, in the field of econometrics. Here's the link from their PR department.

We have a paper somewhat related to this in IJCAI-07. It tries to identify those test cases that do not follow the training distribution.

Xiaoli Li, Bing Liu and See-Kiong Ng. "Learning to Identify Unexpected Instances in the Test Set," Proceedings of Twenth International Joint Conference on Artificial Intelligence (IJCAI-07), 2007.

http://www.cs.uic.edu/~liub/NSF/PSC-IIS-0307239.html

Cheers

Bing

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差、打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店當日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作、兼差打工、假日兼職、兼職工作、酒店兼差、兼差、打工兼差、日領工作、晚上兼差工作、酒店工作、酒店上班、酒店打工、兼職、兼差、兼差工作、酒店上班等,想了解酒店相關工作和特種行業內容,想兼職工作日領、假日兼職、兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚，無業績壓力，無脫秀無喝酒壓力，高層次會員制客源，工作輕鬆，可日領、現領。

一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已，對水水們的上班安全一點保障都沒有！艾葳酒店經紀公司的水水們上班時全程媽咪作陪，不需擔心！只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職、缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??

艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,

等相關服務 幫您快速的實現您的夢想~!!

one day i went shopping outside,and in an ed hardy store,I found some kinds of ed hardy i love most they are Your website is really good Thank you for the information

ed hardyed hardyed hardy clothinged hardy clothinged hardy shoesed hardy shoesdon ed hardydon ed hardyed hardy clothesed hardy clothesed hardy bagsed hardy bagsed hardy swimweared hardy swimweared hardy jeansed hardy jeansed hardy mensed hardy mens Thank you for the informationReally trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

kameralı sohbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

seslidunya

seslisehir

sesli sex

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

sesli muhabbet siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

görüntülü muhabbet siteleri

kameralı sohbet siteleri

kameralı chat siteleri

kameralı muhabbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

birsesver

birses

seslidunya

seslisehir

sesli sex

Post a Comment