17 January 2008

Syntax-based and syntax-informed translation

Happy new year, all... Apologies for being remiss about posting recently. (Can I say "apologies about my remission"?) This post is a bit more of a review of what's out there, rather than trying out new ideas. Citations are not meant to get anywhere close to full coverage.

There are a handful of ways of viewing the syntactic MT problem, essentially delineated by which side(s) of the translation does the tree appear on, and whether the tree is induced entirely from data or whether it's induced from a treebank. There's additionally the constituent- versus dependency-tree issue.

  1. String-to-tree (constituent). Here, we have a string on the input and a tree on the output. In Chinese-to-English translation, this would mean that at translation time, we're essentially trying to parse Chinese into an English tree by means of local reordering. This was essentially what Yamada and Knight were doing back in 2001 and 2002. In order to train such a model, we need (a) parallel data and (b) a parser on the target language. Translation is typically done by some variant of CKY.
  2. Tree-to-string (constituent). Here, we map an input tree to an output string. In C2E, this means that at translation time, we first parse the Chinese, then reorder and flatten it out into an English string. This is essentially what the JHU workshop on adding syntactic features to phrase-based translation was trying to do (though in somewhat of a weak way), and also what Liang Huang has been doing lately. In order to train, we need (a) parallel data and (b) a parser on the source side. Translation can be done any way you want, though once parsing is done, it's usually a lot easier than running CKY.
  3. Tree-to-tree (constituent). Here, we map an input tree to an output tree. In C2E, this means that at translation time, we first parse the Chinese, then reorder the tree and translate subtrees/leaves into English. Koehn and Collins have worked on this problem. In order to train, we need (a) parallel data, (b) a parser on the source side and (c) a parser on the target side. Translation is similar to Tree-to-string.
  4. String-to-tree-to-string (constituent). I'm not sure if there's agreed-upon terminology for this task, but the idea here is to translate without having a parser on either side. At translation time, we essentially parse and translate simultaneously (rather like string-to-tree). But at training time, we don't have access to source trees: we have to induce them from just (a) parallel data. This is typified by Wu's inverse transduction grammars and more recently by David Chiang's Hiero system (plus others).
  5. Any of the above but on dependency trees. To my knowledge, the only groups that are seriously working on this are Microsoft (see, eg., Chris Quirk) and our friends at Charles University.
Below is my attempt to categorize the pros and cons of these different approaches along a wide variety of metrics. The metrics are: resource usage (how much labeled data do we need), training ease (computationally), testing ease (computationally), ngram LM (is it easy to include a target ngram language model), syntactic LM (is it easy to include a target syntactic language model), minimal assumptions (does the model make strong assumptions about, for instance, the degree of matchy-ness between source and target?), extensibility (is it easy to add additional features to the input side, such as NE info without breaking things).


S2TT2ST2TS2T2S
resource med med bad good
train-ease good good good bad
test-ease bad good good bad
ngram-lm med good med good
syntax-lm good bad good bad
assumptions good good bad good
extensibility med good good bad

I didn't include dependencies on this list because it essentially depends on what the underlying model is, and then remains the same as in the constituent case. The only caveat here is that syntactic language modeling with dependency trees is not as well proven as with constituency trees.

As far as syntactic language modeling goes, pretty much this is easy if you are producing trees on the target; otherwise, not. Tree-to-tree gets the unique distinction of being bad with respect to assumptions, essentially because you have to do tree-matching which is hard to do well. You could argue that certain forms of s2t2s should be here too (Wellington, Turian and Melamed had a paper on the assumptions that ITG makes at ACL 2006).

In terms of resource usage, T2T is obviously the worst, S2T2S is obviously the best. Whether S2T or T2s is better depends on the language pair.

There's of course the issue of which work best. For this, I don't really know (I'm sure people involved in MTeval and GALE can chime in here, though of course these competitions are always "unfair" in the sense that they also measure engineering ability).

At this point, I'd like to relate this back to the recent discussion of Translation out of English. One thing to notice is that when translating into English, S2T has the advantage that all we need is an English parser (there are a few of those lying around if you search hard enough) whereas T2S requires a foreign parser. However, when translating from English, T2S starts looking a lot more attractive because we have good English parsers. Actually, we have a lot of good annotations for English (NE coref, for instance; discourse parsing to some degree; etc.). From this perspective, the relative ease of extensibility of T2S models begins to look pretty attractive. That is to say, if I were actually to work myself on translation out of English, I would probably seriously consider T2S models. Of course, if you're translating from X to Y, where neither X nor Y is English, then you're kind of hosed and might prefer S2T2S.

10 comments:

Sophiya mary said...

For the Online research and for creating a Database into Microsoft Office knowledge processing is the good online service that i found.

Anonymous said...

I am so happy to get some kal geons and the kal gold is given by my close friend who tells me that the kal online geons is the basis to enter into the game. Therefore, I should kal online gold with the spare money and I gain some kalonline Geons from other players.

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Anonymous said...

酒店上班

酒店兼職

台北酒店

打工兼差

酒店工作

酒店經紀

禮服店

酒店兼差

酒店打工

酒店PT

酒店正職

酒店賺錢

酒店日領

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

cilemsin42 said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chat
sesli sohbet siteleri

sesli chat siteleri sesli sohbetsesli chat
sesli sohbet siteleri
sesli chat siteleri
SesliChat
cılgın sohbet
güzel kızlar
bekar kızlar
dul bayanlar
seviyeli insanlar
yarışma
canlı müzik
izdivac
en güzel evlilik
hersey burada
sesliparti
seslisohbet odalari
Sesli adresi
Sesli Chat
SesliChat Siteleri
Sesli Chat sitesi
SesliChat sitesi
SesliSohbet
Sesli Sohbet
Sesli Sohbet Sitesi
SesliSohbet Sitesi
SesliSohbet Siteleri
Muhabbet Sitesi
kamerali chat
Görüntülü Sohbet
Hasret gülleri
Çet sitesi
SesliSohbet
Sesli Sohbet
Canli sohbet
Turkce sohbet
Kurtce Sohbet
Kurtce Chat
Kurtce Muhabbet
Kurtce Sohbet
Kurdish Chat
SesliChat
Sesli Chat
SesliSanal
Guncel Haber
sohbet Sitesi
Chat sitesi..

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbet
seslisohbet
sesli chat
seslichat
sesli sohbet sitesi
sesli chat sitesi
sesli sohpet
kamerali sohbet
kamerali chat
webcam sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex