03 May 2006

Supervised Hidden Variable Models

Hidden variable models have been extremely successful in unsupervised NLP problems, such as the famous alignment models for machine translation. Recently, two parsing papers and a textual entailment paper have applied hidden variable techniques to supervised problems. I find this approach appealing because it allows one to learn hidden variable models that are (hopefully) optimal for the end goal. This contrasts with, or example, unsupervised word alignment, where it is unclear if the same alignment is good for two different tasks, for instance translation or projection.

The question that interests me is "what exactly are hidden variables in these models?" As a simple example, let's take the following problem: we have an English sentence and an Arabic sentence and we want to know if they're aligned. Dragos has a paper that shows that we can do a good job of making this determination if we do the following: construct an alignment (using, eg., GIZA++) and then compute features over the alignment (length of aligned subphrases, number of unaligned words, etc.). One can imagine similar models for textual entailment, paraphrase identification, etc.

I would ideally like to learn the alignments as part of training the classifier, treating them essentially as hidden variables in the supervised problem.

So what happens in such a model? We learn to produce alignments and then define features over the alignments. From an information theoretic perspective, we needn't bother with the alignments: we can just define features on the original sentence pair. Perhaps these features will be somewhat unnatural, but I don't think so. To build the alignments, we'll have features like "the word 'man' is aligned to the word 'XXX'" (where 'XXX' is Arabic for 'man'). Then on top of this we'd have features that count how many words are successfully aligned. It's somewhat unclear if this is much more than just including all word pair features for a pure classification technique.

The key difference appears to be in the fact that we impose a strong constraint on the alignments. For instance, that they are one-to-one or one-to-many or something similar. This significantly reduces the number of features that are fed into the classifier.

At the same time, the hidden variable models look a lot like introducing a hidden layer into a neural network! Consider a drastic simplification of the parallel sentence detection problem (which is, in the end, just a binary classification problem). Namely, consider the XOR problem. As we all know, we can't solve this with a linear classifier. But if we add a hidden variable to the mix, we can easily find a solution (eg., the hidden variable indicates left-versus-right and then conditional on this, we learn two separate classifiers, one pointing up and one pointing down). This is exactly what a two layer neural net (i.e., one hidden layer) would do!

From this perspective, it seems one can argue that hidden variable models are simply increasing the model complexity of the underlying learner. This comes at a cost: increased model complexity means we either need more data or stronger prior information. Hidden variable models --- more precisely, the constraints present in hidden variable models --- seem to provide this prior information.

11 comments:

Bob Carpenter said...

HMMs as used in speech recognition typically hide the following two items during training: (a) the state [typically begin, middle, and end states for each phoneme or tri-phone], and (b) mixture state for acoustic emissions. Training data includes acoustic streams paired with lexical token sequences; sometimes with phonemic level transcription and/or alignment, both of which leave most of the fine-grained state alignment still hidden.

hal said...

Yup, roughly the same thing happens in the original word-based MT systems (the phrase- and syntax-based newcomers are much more complicated). The question is: what does this buy us?

Nobuyuki Shimizu said...

Isn't it the unlabeled data? Sort of like Ando & Zhang
( http://www-cs-students.stanford.edu/%7Etzhang/papers/jmlr05_semisup.pdf ) that was mentioned a bit ago?

By the way, I left a pragmatics example in the past comment.

Kevin said...

I think there should be a distinction between hidden variables in generative vs. discriminative models.

In discriminative models like neural networks, the hidden variables can be used to increase model complexity/flexibility, as you said. In fact, the hidden layer in can be thought of as an implicit feature mapping similar to the high dimensional kernels used in SVMs and other kernel machines.

However, the role of hidden variables in generative models is different, and applications such as the IBM models for word alignment and HMM for speech recognition fall under this category. In generative models, obtaining an accurate model of the underlying phenomenon is extremely important, since an incorrect model trained via maximum likelihood results in poor classification. Thus, hidden variables are used introduced to build more parsimonious and accurate models. This is the same reason statisticians do hierarchical and mixture modeling--often introducing a hidden variable or two will (a) give a more accurate generative model, and (b) allow fewer parameters or simpler parametric distributions.

In the speech recognition research in our lab, we try to think of all sorts of interesting variables to add to our dynamic bayesian networks so as to model the speech process more accurately. Some of them are hidden, and some of them are deterministic given other variables. In effect, this kind of work is analagous to feature engineering for discriminative models. So I think hidden variables are important in generative models, just as features are important in discriminative models.

That being said, I know there's an effort to develop discriminative models that utilize hidden variables. I think they have a different goal, though--e.g. using hidden variables to model missing features/labels. Anyone know more on this?

hal said...

kevin -- I agree that hidden units in NNs are different, especially in the generative world. That is, in graphical models, hidden variables have a meaning. Typically they do not in NNs. But I have a feeling that when you do generative-style hidden variables in discriminative techniques, like the cited Koo and Collins paper, what you end up with is a lot like a restricted hidden unit in a neural network. I want to understand better what this means.

In essense your closing question is exactly what I'm interested in.

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

qishaya said...

one day i went shopping outside,and in an ed hardy store,I found some kinds of ed hardy i love most they are Your website is really good Thank you for the information ed hardy ed hardy ed hardy clothing ed hardy clothing ed hardy shoes ed hardy shoes don ed hardy don ed hardy ed hardy clothes ed hardy clothes ed hardy bags ed hardy bags ed hardy swimwear ed hardy swimwear ed hardy jeans ed hardy jeans ed hardy mens ed hardy mens Thank you for the information

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

cilemsin42 said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chat
sesli sohbet siteleri

sesli chat siteleri sesli sohbetsesli chat
sesli sohbet siteleri
sesli chat siteleri
SesliChat
cılgın sohbet
güzel kızlar
bekar kızlar
dul bayanlar
seviyeli insanlar
yarışma
canlı müzik
izdivac
en güzel evlilik
hersey burada
sesliparti
seslisohbet odalari
Sesli adresi
Sesli Chat
SesliChat Siteleri
Sesli Chat sitesi
SesliChat sitesi
SesliSohbet
Sesli Sohbet
Sesli Sohbet Sitesi
SesliSohbet Sitesi
SesliSohbet Siteleri
Muhabbet Sitesi
kamerali chat
Görüntülü Sohbet
Hasret gülleri
Çet sitesi
SesliSohbet
Sesli Sohbet
Canli sohbet
Turkce sohbet
Kurtce Sohbet
Kurtce Chat
Kurtce Muhabbet
Kurtce Sohbet
Kurdish Chat
SesliChat
Sesli Chat
SesliSanal
Guncel Haber
sohbet Sitesi
Chat sitesi..

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex