12 May 2008

Teaching machine translation

Last Fall (2007), I taught an Applications of NLP course to a 50/50 mix of grads and senior undergrads. It was modeled partially after a course that I took from Kevin Knight while a grad student. It was essentially 1/3 on finite state methods for things like NER and tagging, then 1/3 on machine translation, then 1/3 on question answering and summarization. Overall, the course went over fairly well.

I had a significant problem, however, teaching machine translation. Here's the problem.

Students knew all about FSTs because we used them to do all the named-entity stuff in the first third of class. This enabled us to talk about things like IBM model 1 and the HMM model. (There's a technical difficult here, namely dealing with incomplete data, so we talk about EM a little bit.) We discuss, but they don't actually make use of, higher order MT models.

Now, we all know that there's a lot more to MT than model 4 (even limiting oneself to statistical translation techniques). Namely, there are phrase-based models and syntactic models. We had a very brief (one lecture) overview of syntactic models at the end. My beef is with phrase-based models.

The problem is that we've gone though all this prettiness to develop these word-based models, and then I have to teach them grow-diag-final, phrase extraction and phrase scoring. I almost felt embarrassed doing so. The problem is that these things are obviously so heuristic that throwing them on top of this really pretty word-for-word translation model just kills me. And it's not just me: the students were visibly upset by the lack of real modeling behind these techniques.

One option would be just not to teach this stuff. I don't really think that it sheds much light on the translation process. The reason I don't like this solution is because it's nice to be able to say that they will have a handle on a not-too-difficult to understand/implement method for doing real-world MT. Instead, I could just spend that time on syntactic models. The situation there is better (you can talk about the hierarchy of tree transducers, etc.), but not perfect (eg., all the work that goes in to rule extraction is not too dissimilar from all the work that goes into phrase extraction).

I suppose that this is just the defacto problem with a relatively immature field: there hasn't been enough time for us to really tease apart what's actually going on in these models and try to come up with some coherent story. I'd love a story that doesn't involve first doing word alignment and, is, in some sense, integrated.

19 comments:

DesiLinguist said...

Hal,

I think you bring up a really good point because I ran into a similar issue when teaching an 'intro to NLP' class last fall to first year graduate students. I have always thought that more recent "state-of-the-art" MT techniques sacrifice model elegance for practicality . However, it is hard to dispute the fact that the newer models do perform "better" translation than the older, more elegant models. I think the idea of scaling up to phrases as units of translation is a step in the right direction but its current execution is an ad-hoc one, nonetheless.

Playing devil's advocate for a minute, I could imagine how this could be beneficial for students. They could learn that theoretically elegant models do not always perform better than quick and dirty approximations. Take the Och et al. (2004) smorgasboard paper, for instance.

Ramkumar G said...

Hello,

I am really interested in NLP and AI. I am stuck in a Bank doing a some brain dead coding...

I want to get into NLP really bad.. Would you be kind enough to let me know where I can start? Not necessarly a paying-job... Any starting point.

Ram.

David said...

You could teach Marcu and Wong's model or the work that John DeNero is currently doing. They don't beat the heuristics but are a lot more interesting. Then you could mention the heuristics in 5 minutes and lament that the elegant models don't beat them -- which is in itself a very important lesson.

Or, maybe you can figure out a new method so you'll have something to teach next year :)

ccb said...

I second David's motion about teaching Marcu and Wong, as a way of transitioning into phrase-based translation without sacrificing the elegance of the word-based models.

Also, You might consider using Philipp Koehn's draft textbook for a reader.

DesiLinguist said...

Actually, I am not sure why I forgot to mention Marcu and Wong's joint phrase-based model. I second that as well :)

Alex said...

Another interesting blog, with discussions about languages is http://www.lingo24.com/blogs/company/
for anyone that wants to visit.

hal said...

This is annoying -- for some reason Blogger stopped emailing me notifications of comments to posts, so I didn't know people had written!

I am actually using Philipp's book, but it (or at least the version I have) doesn't do Marcu and Wong. THis is actually quite a good suggestion, though!

Bradleyjames said...

Thanks a lot, this is really helpful. Really well for me and I’m not going back to the proprietary guys! If You Need More Information Please Visit us :- eTranslate is an international company specialising in the provision of Internationalization and Globalization Solutions.

Ali Reza Ebadat said...

Hi
As a master student in Saarland university, I'm interested to know how you are teaching NLP to your student. I wanted to compare it with my course. But I have found that your slide is not available for me. Why? In Saarland university every slides in Computational Linguistic is available for everyone without user name and password.
I think it would be good idea to allow other people to see you slides and try to understand your idea about it.
Anyway, thank you for sharing NLP things in your weblog
Saarland computational Linguistic course list and slides
Best
Ali Reza Ebadat

hal said...

Ali -- What can't you access? The only thing that I think is passworded is Philipp Koehn's SMT book chapters, which I can't make public. Everything else should be accessible by anyone. (Note that there aren't many slides because I don't really like teaching with slides.)

Ali Reza Ebadat said...

Sorry, it was my mistake because I thought SMT is your slide. I didn't realize that it is from SMT text book.
I'm reading your weblog because it is useful for me and I can find fresh information about NLP.

Anonymous said...

It is the goonzu gold which make me very happy these days, my brother says goonzu money is his favorite games gold he likes, he usually buy some goonzu online gold to start his game and most of the time he will win the buy goonzu gold back and give me some cheap goonzu gold to play the game.

Anonymous said...

網頁設計,情趣用品,情趣用品,情趣用品,情趣用品
色情遊戲,寄情築園小遊戲,情色文學,一葉情貼圖片區,情惑用品性易購,情人視訊網,辣妹視訊,情色交友,成人論壇,情色論壇,愛情公寓,情色,舊情人,情色貼圖,色情聊天室,色情小說,做愛,做愛影片,性愛

免費視訊聊天室,aio交友愛情館,愛情公寓,一葉情貼圖片區,情色貼圖,情色文學,色情聊天室,情色小說,情色電影,情色論壇,成人論壇,辣妹視訊,視訊聊天室,情色視訊,免費視訊,免費視訊聊天,視訊交友網,視訊聊天室,視訊美女,視訊交友,視訊交友90739,UT聊天室,聊天室,豆豆聊天室,尋夢園聊天室,聊天室尋夢園,080聊天室,080苗栗人聊天室,女同志聊天室,上班族聊天室,小高聊天室 

AV,AV女優
視訊,影音視訊聊天室,視訊交友
視訊,影音視訊聊天室,視訊聊天室,視訊交友,視訊聊天,視訊美女

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbet
seslisohbet
sesli chat
seslichat
sesli sohbet sitesi
sesli chat sitesi
sesli sohpet
kamerali sohbet
kamerali chat
webcam sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex