tag:blogger.com,1999:blog-19803222.post4007499560421314512..comments2024-03-18T01:45:45.724-06:00Comments on natural language processing blog: Translation out of Englishhalhttp://www.blogger.com/profile/02162908373916390369noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-19803222.post-74489049185226025532009-05-12T10:39:00.000-06:002009-05-12T10:39:00.000-06:00酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒...酒店經紀PRETTY GIRL <A HREF="http://www.taipeilady.com/" REL="nofollow" TITLE="台北酒店經紀人">台北酒店經紀人</A> ,<A HREF="http://tw.myblog.yahoo.com/jw!qZ9n..6QEhhc0LkItOBm/" REL="nofollow" TITLE="禮服店">禮服店</A> 酒店兼差PRETTY GIRL<A HREF="http://www.mashow.org/" REL="nofollow" TITLE="酒店公關">酒店公關</A> 酒店小姐 彩色爆米花<A HREF="http://blog.xuite.net/jkl338801/blog/" REL="nofollow" TITLE="酒店兼職">酒店兼職</A>,酒店工作 彩色爆米花<A HREF="http://tw.myblog.yahoo.com/jw!BIBoU5SeBRs21nb_ajFpncbTqXds" REL="nofollow" TITLE="酒店經紀">酒店經紀</A>, <A HREF="http://mypaper.pchome.com.tw/news/thomsan/3/1310065116/20080905040949/" REL="nofollow" TITLE="酒店上班">酒店上班</A>,酒店工作 PRETTY GIRL<A HREF="http://tw.myblog.yahoo.com/jw!rybqykeeER6TH3AKz1HQ5grm/" REL="nofollow" TITLE="酒店喝酒">酒店喝酒</A>酒店上班 彩色爆米花<A HREF="http://mypaper.pchome.com.tw/news/jkl338801/" REL="nofollow" TITLE="台北酒店">台北酒店</A>酒店小姐 PRETTY GIRL<A HREF="http://www.mashow.org/" REL="nofollow" TITLE="酒店上班">酒店上班</A>酒店打工PRETTY GIRL<A HREF="http://www.tpangel.com/" REL="nofollow" TITLE="酒店打工">酒店打工</A>酒店經紀 彩色爆米花Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-59167178520347914822007-12-09T19:58:00.000-07:002007-12-09T19:58:00.000-07:00I would disagree with you on the quality of automa...I would disagree with you on the quality of automatic translation into English...still not reliable enough. I do agree that it's pretty impressive what n-gram models have yielded, but we might be close to the limits on what we can do with it.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-68223204589724431062007-12-08T06:26:00.000-07:002007-12-08T06:26:00.000-07:00Translating demo between English and morphological...Translating demo between English and morphologically very rich Czech, see https://blackbird.ms.mff.cuni.cz/cgi-bin/bojar/mt_cgi.plAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-59784626399104187922007-11-26T12:30:00.000-07:002007-11-26T12:30:00.000-07:00You make a lot of valid points. The commercial int...You make a lot of valid points. The commercial interest English-X may be even bigger than X-English, but the funding situation in the US is different. Fortunately things a better here in the old world.<BR/><BR/>I have found in the translation of European languages that morphology is one of the main reasons why translations into a language is worse that translation out of it: generating morphology is much harder than translating it. I don't think this just an artefact of the BLEU score.<BR/><BR/>Now it's time to plug my Europark (2005) paper and the recent work on factored models...Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-29353882003290136212007-11-21T05:50:00.000-07:002007-11-21T05:50:00.000-07:00There has been some research in our lab on adoptin...There has been some research in our lab on adopting n-grams for Finnish (and related languages) which is morphologically rich. The approach has been to segment the words automatically using unsupervised morpheme-like segments. This improves n-gram performance significantly for morphology rich languages. The segmentation was also applied to statistical MT, for which Finnish is a difficult source and target. There however the scores did not improve that much. The first paper below seems to have applied the methods to Arabic as well. <BR/> <BR/>Some papers in case you're interested:<BR/>http://www.cis.hut.fi/vsiivola/papers/creutz07naacl.pdf<BR/>http://www.cis.hut.fi/svirpioj/papers/virpioja07mtsummit.pdfAnonymoushttps://www.blogger.com/profile/13226262021880233087noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-53744922107650085572007-11-20T16:11:00.000-07:002007-11-20T16:11:00.000-07:00Hal,A lot of people think that the TransTac domain...Hal,<BR/><BR/>A lot of people think that the TransTac domain is very narrow, but that's not true, at least not when compared with other speech systems like dialog systems. Quite a wide range of topics are covered within its purview, and the Arabic vocab is about 75K. All of the surviving systems in the program in fact use statistical machine translation as their primary mechanism. <BR/><BR/>Now granted, it is not as broad as news (GALE), nor as syntactically demanding in terms of complex clausal structure to get right, etc. I don't know how E2A translation would do for that domain. Probably it would not be totally awful, though.<BR/><BR/>Great blog by the way.Dave Stallardhttps://www.blogger.com/profile/17273418678631618021noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-13823219457667064202007-11-20T06:35:00.000-07:002007-11-20T06:35:00.000-07:00kevin --that usage list is pretty cool... would ha...kevin --<BR/><BR/>that usage list is pretty cool... would have been great if they could have worked the % increase into the graph itself. but it's pretty amazing.<BR/><BR/>as for the direct versus bridge, i guess it would depend primarily on if you have actual direct parallel data. if you don't, you're pretty much hosed and have to do bridge. it's possible that even with some parallel data, it might be better to do a combination (i wouldn't find this surprising at all). i guess one question is whether you can do anything more interesting than just X1 -> N-best-E and then N-best-E -> N^2-best-X2 and then rerank.<BR/><BR/>dave -- <BR/><BR/>i had forgotten about this project, but my sense is that speech to speech in the case of transtac is a very very limited domain and that classification-based and small rule-based systems actually do quite well. are the results you're quoting for E2A vs A2E on this domain, or is it for the more general text-to-text in (eg) news? if the former, then my guess is there isn't actually much "generation" going on, which may explain away some of the good performance.<BR/><BR/>the point about evaluation metrics is very true.halhttps://www.blogger.com/profile/02162908373916390369noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-38589949757749149532007-11-20T05:36:00.000-07:002007-11-20T05:36:00.000-07:00In fact, another (much smaller) DARPA program, the...In fact, another (much smaller) DARPA program, the TransTac speech-to-speech translation project, IS looking at translation from English to (Iraqi) Arabic. The motivation is to allow full 2-way communication between the parties.<BR/><BR/>Yes, the morphological complexity of Arabic does cause a problem, but mainly for the metrics, rather than the translation itself, it would seem. BLEU and TER all show much worse performance for E2A than A2E, but subjective Likert-scale evaluation shows the two translation directions as performing about the same in many evals.Dave Stallardhttps://www.blogger.com/profile/17273418678631618021noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-55899885923193806762007-11-19T21:19:00.000-07:002007-11-19T21:19:00.000-07:00I think another reason for working on other transl...I think another reason for working on other translation pairs is commercial. Just look at the percentage increase of Chinese, Portuguese, and Arabic speakers on the Internet:<BR/><BR/>http://www.internetworldstats.com/stats7.htm<BR/><BR/>There's gotta be interest in translation from English to X, as well as translation pairs not involving English. Now, the latter is yet another intriguing MT research problem, i.e. should we do bridge translation (X1->English, English->X2), direct translation (X1->X2), or a combination?Kevin Duhhttps://www.blogger.com/profile/07407894290644783502noreply@blogger.com