25 January 2006

Domain Adaptation

We NLPers face this problem all the time: we have training data from one domain/genre but really want to work in another. Eg., the treebank is WSJ text but we care about email or web pages or whatever. We'd like to be able to intelligently use our annotated WSJ text to get a good statistical model for a different domain.

I've been working on this problem for a while now and have a partial solution. I'm most interested in the case that we have lots of annotated "out of domain" (OOD) data and a little annotated "in domain" (ID) data. Domain Adaptation for Statistical Classifiers is a paper that's been accepted to JAIR that presents one way to model this problem. The key idea is to model the OOD/ID data distributions as mixtures. There are three mixture components: a "truly ID" distribution, a "truly OOD" distribution and a "general" distribution. We say the OOD data comes from a mixture of "truly OOD" and "general," while the ID data comes from a mixture of "truly ID" and "general." The learning task is to tear apart our data sets to figure out what "general" (and hence relevant to the ID task) information there is in the OOD data.

The framework, when applied to maximum entropy models, gives relatively simple update equations for model parameters. The derivations take a bit of thought, but are not insane. The approach is a partial solution because it's limited to maximum entropy models. I think the problem should be amenable to a more learning theoretic analysis, but haven't had time to make much headway here.

I'm a bit surprised this problem hasn't gotten more attention in the NLP community (a similar problem -- speaker adaptation -- exists in the speech community). Or perhaps I've missed it in NLP. It seems like we, as NLPers, should really care about this issue.

7 comments:

Deepak said...

I am very interested in this problem now. With the growth of the data in WWW it becomes important to identify certain portions of the corpus that are useful for your task.

hal said...

i've recently talked to drew bagnell about some work he's done in a related problem: robust supervised learning. the idea in robust learning is that you don't trust that your test distribution is identical to your training distribution, but the difference is bounded (in a KL sense). this isn't the same as classifier migration, but they're related. i think there might be some room for marrying the techniques.

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Adi said...

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex