Comments on natural language processing blog: Domain adaptation vs. transfer learning

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店酒店兼差PRETTY GIRL酒店公關酒...

2009-05-12T10:39:00.000-06:00

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店酒店兼差PRETTY GIRL酒店公關酒店小姐彩色爆米花酒店兼職,酒店工作彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀彩色爆米花

Hi Hal, have you come across NLP specifically Na...

2007-12-10T03:41:00.000-07:00

Hi Hal,

have you come across NLP specifically Named Entity Recognition applied to the legal domain?

Hi Hal,I may be wrong but I feel if p(x) changes t...

2007-12-02T06:13:00.000-07:00

Hi Hal,
I may be wrong but I feel if p(x) changes the only way p(y/x) will not be affected is that distribution of y is not dependent of x, so how is DA and TL different?
Whats your solution to the microarray normalization problem, if you thinking of one?
Ani

Hal, I like your categorization of various scenari...

2007-12-02T01:37:00.000-07:00

Hal, I like your categorization of various scenarios (e.g. S+T-). However, I think the words "domain adaptation", "transfer learning", etc. are just terms people use to roughly describe their training scenario and these might not correspond to your definitions.

Here's a question similar to Ryan's: If you can call S-T+ semi-supervised learning, can you call S+T- transductive learning?

I have more fundamental question that I'm puzzled with: how different does the training and test distribution have to be before you begin to call the problem a domain adaptation or transfer learning problem? For instance, is WSJ->Brown corpus a DA problem? If so, is WSJ section1->WSJ section22 a DA problem? If so, is WSJ sentence#2202-> WSJ sentence#2203 a DA problem? We seem to draw the line between DA/TL and traditional supervised learning somewhat arbitrarily. Can we draw the line in a more principled way, or even more, do we even need to draw this line?

ryan -- for the first statement, you're right, it'...

2007-11-30T15:11:00.000-07:00

ryan --

for the first statement, you're right, it's not :).

i guess the issue is that in transfer learning we want to think of the two tasks in terms of how related they are. maybe the right way to think of this is as some sort of mutual information... what's the entropy of label2 given x, versus given x and label1. there's obviously (in most cases) useful information here. but then you get this compounding problem: since i don't actually know label1, i have to integrate it out when trying to predict label2, which removes any notion of lower entropy.

I am confused, in transfer learning where P(X) doe...

2007-11-30T11:41:00.000-07:00

I am confused, in transfer learning where P(X) does not change, how is S-T+ any different from semi-supervised learning?

The statement "That is, we're assuming that knowing the source label distribution tells us nothing about the target label distribution." is a bit perplexing. I always thought the opposite, otherwise, why do we case about P(Y|X) for S (like you suggest). When I think of TL I think of cases like the following: I have a large set of data labeled with definition A of a gene, and a small set of data labeled with definition B of a gene. Both annotations are drawn from the same corpus, e.g., medline. For my problem, I am interested in extracting mentions of genes under definition B. Clearly the labeled corpus A is going to help. Even though the definitions A and B are not identical, they are at least informative about each other. So how can I use the data annotated with A to help learn a classifier to predict definition B? But maybe this is more multi-task learning? I certainly don't think of this as DA since P(X) hasn't changed.