29 September 2008

Statistical Machine Translation Papers at COLING

(guest post) Not always is a major conference a short train ride away, so I went down to Manchester even though I had no real business being at COLING this year. Liang Huang gave an interesting tutorial where he tied together a number of natural language problems and types of dynamic programming solutions for it. A nice treatment of this area, with examples from tree-based statistical machine translation, of course. There were also a lot of very strong papers, so let's take a look at them.

Syntax-Based SMT is alive: Currently one of the most exciting areas of statistical machine translation is the development of tree-based models, with all the open question: Real syntax in the source, in the target? Which grammar formalism? How do we parameterize the models? And all this efficient, please! When using real syntax on both sides, many of the rules in standard phrase-based models do not match anymore the syntactic annotations, so they have to be dropped. So, to accommodate more rules, Zhang et al. propose to extend the synchronous grammar formalism to allow multi-headed rules. He et al. build maximum entropy models for rule selection in hierarchical phrase models, similar to recent work by Carpuat and Wu in phrase-based models. Additional source-side context may of course be a syntactic parse tree. Xiong et al. use features about syntactic spans in their translation model and maximum entropy reordering model in a BTG framework. Zhang et al. present an algorithm to extract rules from a word-aligned corpus in linear time. The trick is to build up a tree structure that compactly encodes all possible rules. Storing all these rules is a real problem, too, it may take up tera-bytes, but Lopez presents a handy solution: suffix arrays. He also shows that rules that cover longer spans do help in hierarchical models, even if long phrases do not matter much in phrase-based models. Finally, Zollmann et al. present detailed results from their time at Google where they compared their syntax-augmented model against a hierarchical and standard phrase model -- so, syntax has arrived at Google at last.

Back to the Basics: Statistical machine translation systems have evolved into a fairly long pipeline of steps, and many of them deserve more attention. For instance the phrase segmentation of the input. Does it matter? The standard approach uses uniform segmentation, and maybe a phrase count feature. Blackwood et al. suggest to build an explicit phrasal segmentation model, which takes the form of a smoothed bigram phrase model. Moore and Quirk examine everybody's favorite: minimum error rate training (MERT), the step used to fine-tune weights given to various model components. This is typically done with a random starting point, and then hillclimbing to a better weight setting to optimize BLEU. Since this method easily gets stuck in local minmima, this is repeated several times. Moore and Quirk promise better merting (yes, that is a verb!) with fewer random restarts and better selection of the starting points through a random walk. But also the n-best list of candidate translations used during merting or re-ranking may be improved. Chen et al. suggest to add additional entries using a confusion network or by concatenating n-grams found in candidate translations. Speaking of building of confusion networks, they are also used in combining different system outputs (which is the current key method to win competitions, even though bad-mouthed as intellectually bankrupt at the recent NIST evaluation meeting). Ayan et al. suggest that better confusion networks and better system combination results may be achieved, when considering synonyms as found in Wordnet for matching up words when aligning the different outputs. Zwarts and Dras show that system combination could also be done by building a classifier that chooses one of the outputs for each sentence.

Reordering: Two papers on reordering, one of the big unsolved problems in statistical machine translation. Elming shows some advances in the first-reorder-into-lattice-then-decode approach, specifically how this lattice should be scored: he argues for scoring the final output to check which rules were implicitly followed (either by applying the rule or using a phrase translation that has it internalized). Zhang et al. propose that different reordering models should be built for different types of sentences, such as statements vs. questions.

Neat and surprising: We typically train our system on corpora that we find on web sites of multilingual institutions, such as the UN or the EU. When using such data, does it matter what the original source language was? I doubt it, but then van Halteren shows that he can detect the original source language in English Europarl documents with over 96 percent accuracy.

Connecting with our rule-based friends: An effective but simple method to combine a rule-based system like Systran with a statistical model is but first translating with Systran and then learn a statistical model to translate Systran English into real English. Based on your biases, you can call this rule-based preprocessing, serial combination, or statistical post-editing. Ueffing et al. show that some information from the rule-based system may help the statistical component, such as annotation of names or other reliable markup.

Aligning and building up dictionaries: What can you do with a large monolingual source language corpus, or a target language corpus, or a conventional bilingual dictionary? Wu et al. present various methods and compare them. Tsunakawa et al. use a method using a pivot language (English) to build a Japanese-Chinese dictionary. Given a syntactically parsed parallel corpus, Zhechev and Way compare different methods how to extract subtrees. Macken et al. extract technical term translations from a domain-specific parallel corpus. Lardilleux and Lepage propose an iterative phrase alignment method that first matches short sentences and phrases, and then subtracts the known alignments from longer phrases to extract the remainder, basically pigeon-holing.

Also: A paper on a Bayesian method for Chinese word segmentation by Xu et al.; a paper of transliteration, by Malik et al.; a paper on evaluation, especially if quality of reference translations matters (it does not), by Hamon and Mostefa; and a new grammar formalism for translation by Søgaard.

What's missing? No paper on word alignment or large-scale discriminative training, but there is always Hawaii.

16 comments:

  1. thank you for a great blog entry :-D

    ReplyDelete
  2. Indeed, thanks for such a helpful summary! I think this would have been helpful even if I had been able to attend the conference.

    ReplyDelete
  3. This is a great set of pointers, thanks!

    Minor nit, though - the Blackwood et al. link is wrong. It should be:

    http://www.aclweb.org/anthology-new/C/C08/C08-2005.pdf

    ReplyDelete
  4. Thanks for great summary!
    However, the link of the paper of evaluation by Hamon and mostefa seems wrong...
    I guess it should be http://www.aclweb.org/anthology/C08-2010, not C08-3010

    ReplyDelete
  5. I am intrigued by the "intellectually bankrupt" comment made at NIST on combining different MT systems. (Disclaimer: I am not 100% up to date on my MT literature, so sorry if some of what I mention below has been addressed elsewhere).

    First of all, I agree that if the purpose is to simply win a NIST evaluation, then there is not much benefit besides creating some upper-bound baseline. Also, doing things just to win the evals is not really in the spirit of the eval in the first place. However, to say the practice of combining systems outputs is without intellectual warrant seems a bit harsh.

    From a theoretical point of view we can ask questions like "Can we bound the error of the combo?" "Is this error provably lower than any single model?", which all get at "Why do they work?". There has been extensive studies along these lines in Ensemble ML methods (Bagging, Boosting, Rich Caruana's work). From an algorithmic point of view we can ask "what is the right way of combining systems?". This is particularly interesting for structured output problems like MT. Is there an algorithm that combines all the components of each system in an optimal way? e.g., Sagae and Lavie's 2006 NAACL paper on combining parsers. When translating a document, do we calculate the combo for each sentence in isolation? Or maybe put some discourse/consistency constraints on top of the whole thing and create some huge optimization problem (ala Clarke and Lapata's sentence compression work in EMNLP 2007)?

    Finally there are a lot of empirical questions like "What sort of errors are reduced and can we explain this?", "Is this error reduction possible in a single *efficient* model?", "How does an analysis of a combo system guide our construction of the next generation of single models?"

    ReplyDelete
  6. The lesson wow gold in all this for investors is wow gold that, like it or not, the game buy wow gold of Wall Street is played on two buy wow gold levels -- reality and perception. Investors need to cheap wow gold not only learn which realities are the ones that cheap wow gold lead to profits -- and that's hard enough world of warcrft gold in this crazy world of warcrft gold world! -- but also, how to find reality wow power leveling in the first place behind power leveling the twisted thicket of misinformation world of warcraft power leveling and disinformation out there.

    ReplyDelete
  7. As a new player , you may need some game guides or information to enhance yourself.
    FFXI Gil is one of the hardest theme for every class at the beginning . You must have a good way to manage your FFXI gold.If yor are a lucky guy ,you can earn so many Final Fantasy XI gold by yourself . But if you are a not , I just find a nice way to buy FFXI Gil. If you need , you can buy cheap Final Fantasy XI Gold at our website . Go to the related page and check the detailed information . Once you have any question , you can connect our customer service at any time .


    Making requiem gold is the old question : Honestly there is no fast way to make lots of requiem lant . Sadly enough a lot of the people that all of a sudden come to with millions of requiem money almost overnight probably duped . Although there are a lot of ways to make lots of cheap requiem lant here I will tell you all of the ways that I know and what I do to buy requiem online gold.

    ReplyDelete
  8. Do you know seal cegel? I like it.
    My brother often go to the internet bar to buy sealonline cegel and play it.
    After school, He likes playing games using these seal online cegel with his friends.
    I do not like to play it. Because I think that it not only costs much money but also spend much time. One day, he give me many cheap seal cegeland play the game with me.
    I came to the bar following him and found buy seal online cegelwas so cheap. After that, I also go to play game with him.

    It is a very nice game silkroad gold, I like sro gold. You can play it silkroad online gold, you can buy the cheap silk road gold. You smart and buy cheap silkroad gold.

    ReplyDelete
  9. 酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

    ReplyDelete
  10. Many institutions limit access to their online information. Making this information available will be an asset to all.

    ReplyDelete
  11. Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
    Dungeons Dragons Online
    Fiesta Online

    ReplyDelete
  12. Very informative and trustworthy blog. Please keep updating with great posts like this one. I

    have booked marked your site and am about to email it to a few friends of mine that I know

    would enjoy reading.



    flyff penya
    cheap flyff penya
    flyff penya
    buy flyff penya
    flyff penya
    cheap flyff penya
    buy flyff penya
    flyff penya
    flyff penya
    metin2 yang
    cheap metin2 yang
    metin2 yang
    buy metin2 yang
    metin2 yang
    cheap flyff penya
    buy metin2 yang
    metin2 yang

    ReplyDelete