We NLPers face this problem all the time: we have training data from one domain/genre but really want to work in another. Eg., the treebank is WSJ text but we care about email or web pages or whatever. We'd like to be able to intelligently use our annotated WSJ text to get a good statistical model for a different domain.

I've been working on this problem for a while now and have a partial solution. I'm most interested in the case that we have lots of annotated "out of domain" (OOD) data and a little annotated "in domain" (ID) data. Domain Adaptation for Statistical Classifiers is a paper that's been accepted to JAIR that presents one way to model this problem. The key idea is to model the OOD/ID data distributions as mixtures. There are three mixture components: a "truly ID" distribution, a "truly OOD" distribution and a "general" distribution. We say the OOD data comes from a mixture of "truly OOD" and "general," while the ID data comes from a mixture of "truly ID" and "general." The learning task is to tear apart our data sets to figure out what "general" (and hence relevant to the ID task) information there is in the OOD data.

The framework, when applied to maximum entropy models, gives relatively simple update equations for model parameters. The derivations take a bit of thought, but are not insane. The approach is a partial solution because it's limited to maximum entropy models. I think the problem should be amenable to a more learning theoretic analysis, but haven't had time to make much headway here.

I'm a bit surprised this problem hasn't gotten more attention in the NLP community (a similar problem -- speaker adaptation -- exists in the speech community). Or perhaps I've missed it in NLP. It seems like we, as NLPers, should really care about this issue.

## 25 January 2006

### Domain Adaptation

Posted by hal at 1/25/2006 11:31:00 AM

Subscribe to:
Post Comments (Atom)

## 7 comments:

I am very interested in this problem now. With the growth of the data in WWW it becomes important to identify certain portions of the corpus that are useful for your task.

i've recently talked to drew bagnell about some work he's done in a related problem: robust supervised learning. the idea in robust learning is that you don't trust that your test distribution is identical to your training distribution, but the difference is bounded (in a KL sense). this isn't the same as classifier migration, but they're related. i think there might be some room for marrying the techniques.

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

kameralı sohbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

seslidunya

seslisehir

sesli sex

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

sesli muhabbet siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

görüntülü muhabbet siteleri

kameralı sohbet siteleri

kameralı chat siteleri

kameralı muhabbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

birsesver

birses

seslidunya

seslisehir

sesli sex

Post a Comment