07 March 2009

n-gram words an language Ordering model with

N-gram language models have been fairly successful at the task of distinguishing homophones, in the context of speech recognition. In machine translation (and other tasks, such as summarization, headline generation, etc.), this is not their job. Their job is to select fluent/grammatical sentences, typically ones which have undergone significant reordering. In a sense, they have to order words. A large part of the thesis of my academic sibling, Radu Soricut, had to do with exploring how well ngram language models can reorder sentences. Briefly, they don't do very well. This is something that our advisor, Daniel Marcu, likes to talk about when he gives invited talk; he shows a 15 word sentence and the preferred reorderings by a ngram LM and they're total hogwash, even though audience members can fairly quickly solve the exponential time problem of reordering the words to make a good sounding sentence. (As an aside, Radu found that if you add in a syntactic LM, things get better... if you don't want to read the whole thesis, just skip forward to section 8.4.2.)

Let's say we like ngram models. They're friendly for many reasons. What could we do to make them more word-order sensitive? I'm not claiming that none of these things have been tried; just that I'm not aware of them having been tried :).
  1. Discriminative training. There's lots of work on discriminative training of language models, but, from what I've seen, it usually has to do with trying to discriminate true sentences from fake sentences, where the fake sentences are generated by some process (eg., an existing MT or speech system, a trigram LM, etc.). The alternative is to directly train a language model to order words. Essentially think of it as a structured prediction problem and try to predict the 8th word based on (say) the two previous. The correct answer is the actual 8th word; the incorrect answer is any other word in the sentence. Words that don't appear in the sentence are "ignored." This is easy to implement and seems to do something reasonable (on a small set of test data).
  2. Add syntactic features to words, eg., via cluster-based language models. My thought here is to look at syntactic features of words (for instance, CCG-style lexicon information) and use these to create descriptors of the words; these can then be clustered (eg., use tree-kernel-style-features) to give a cluster LM. This is similar to how people have added CCG/supertag information to phrase-based MT, although they don't usually do the clustering step. The advantage to clustering is then you (a) get generalization to new words and (b) it fits in nicely with the cluster LM framework.
These both seem like such obvious ideas that they must have been tried... maybe they didn't work? Or maybe I just couldn't dig up papers. Or maybe they're just not good ideas so everyone else dismissed them :).

19 comments:

  1. What I'm about to suggest may not have any practical application, but here goes anyway.

    If you want to discriminatively train e.g. a logistic regression model to identify one permutation from the set of all possible permutations of a fixed string, you face the problem of calculating the partition function (i.e., the sum of the scores of all the permutations). Since the number of permutations grows exponentially with sentence length, exhaustive enumeration of all possible permutations very rapidly becomes impossible.

    It turns out that our statistician friends have been thinking about this problem for a while; a search for "Mallows model" or "generalized Mallows model" should get you started. Most of this work focuses on ranking problems, but I've been wondering whether Mallows models could be extended to include e.g. bigram features between adjacent elements in the permutation (i.e., costs that depend on pairwise adjacencies).

    ReplyDelete
  2. Quick comment... the mallows model idea is definitely interesting. Regarding the training, you could always use a structured prediction algorithm that doesn't require you to sum or argmax...

    ReplyDelete
  3. Typical n-gram LMs suffer from both locality (due to length and sparsity of n-grams - just like HMMs) and label bias (due to backoff/interpolated smoothing - just like HMMs).

    CRFs tackled label bias with a sentence-wide partition function rather than computing each tag locally. Could something like that work for LMs? I've seen people just sample negative data (often from an n-best list from a simpler process) rather than use all n! other orderings.

    Presumably heuristic search could tackle the n! problem at run time.

    ReplyDelete
  4. Could PCFGs be a good solution to the word ordering problem? I mean: Collect a corpus, and tag each word with its part-of-speech. Then train a PCFG on the sequences of POS tags. Then tag each word in a target sentence with its POS, and output the most likely POS sequence wrt the PCFG. Now there may be several word orderings per POS ordering, but will that be a big problem?

    ReplyDelete
  5. Si en tí albergas el deseo de ayudar a otros aqui te pongo como se hace,SALVAR UNA VIDA NO ES POCA COSA ,PIENSALO!!!!!!!!!!!!!!!


    ¿QUÉ ES SER DONANTE DE MÉDULA ÓSEA?
    Ser donante voluntario de Médula Ósea es aceptar firmemente el compromiso moral de donar la médula ósea a un enfermo de cualquier parte del mundo que, sin disponer de familiares compatibles, requiera un trasplante. El único requisito inicial es cumplimentar un formulario y someterse a una pequeña extracción de sangre, como para un análisis de rutina, con el fin de determinar el grupo de histocompatibilidad (HLA).
    ¿QUIÉN PUEDE SER DONANTE?
    Puede incluirse en la Red Mundial de donantes de Médula Ósea a través de REDMO, toda aquella persona con edad comprendida entre los 18 y 55 años y que disfrute de buena salud. El criterio de buena salud consiste en no sufrir enfermedad cardiovascular, renal, pulmonar, de hígado u otras afecciones crónicas que requieran tratamiento continuo, y no tener antecedentes de análisis positivos en cuanto a infecciones de los virus de la hepatitis B, C y síndrome de inmunodeficiencia adquirida (SIDA).
    ¿EN QUÉ CONSISTE DONAR MÉDULA ÓSEA?
    Donar Médula Ósea consiste en proporcionar al enfermo células madre de los glóbulos rojos, glóbulos blancos y plaquetas de la sangre procedente de un donante sano. Ello se lleva a cabo extrayendo del hueso de la cadera una jeringa, una pequeña cantidad de medula, suficiente para conseguir un injerto. Este acto se realiza bajo anestesia general o epidural, y siempre en un hospital especializado, emplazado en la misma localidad o en la más cercana posible a la de residencia del donante,
    En la actualidad, existen estudios muy avanzados cuyos resultados harán posible que en el futuro se generalice la extracción de células madre desde la sangre mediante un proceso alternativo que dura unas dos horas, no requiere anestesia y no tiene más molestias que las de una donación de aféresis para transfusión.
    ¿QUÉ PROBABILIDAD HAY DE SER ELEGIDO PARA DONAR MÉDULA ÓSEA?
    Un donante voluntario de médula ósea puede ser requerido en distintas ocasiones para someterse a nuevas extracciones de sangre que permitirán confirmar y ampliar su tipaje. Ello puede ocurrir inmediatamente tras su inscripción en el Registro, al cabo de cierto tiempo o incluso nunca, si no existiese ningún enfermo potencialmente compatible.
    ¿TIENE RIESGO DONAR MÉDULA ÓSEA?
    No existe otro riesgo que el de la anestesia general o epidural el cual es muy bajo; en personas sanas la probabilidad de complicacciones es de 1 por 50.000 casos.
    Por efecto de la extracción sólo puede aparecer un leve dolor residual en la cadera que desaparece a los pocos días de la donación.
    ¿EXISTE ABSOLUTA LIBERTAD PARA RETIRARSE DEL REGISTRO?
    REDMO es consciente de que las circunstancias personales o físicas de una persona pueden variar a lo largo del tiempo y, por consiguiente, un donante es libre de darse de baja del Registro si así los desea y en cualquier momento, Sin embargo, se recuerda que el ser donante de médula ósea implica un compromiso moral que debe ser cuidadosamente meditado antes e inscribirse en el Registro, y se espera que el donante no cambie de idea si de ello depende la vida de un semejante.
    ¿LA DONACIÓN DE MÉDULA ÓSEA ESTÁ RETRIBUÍDA?
    El donante no recibe compensación económica alguna por el acto de la donación de médula ósea. Los posibles gastos derivados del proceso de donación de su médula le serán costeados en su totalidad.
    La compensación que recibe es la satisfacción de haber salvado una vida, o por lo menos de haberlo intentado.

    Si te animas puede que salves una vida! si una vida!!!!!!!!!!!!!!
    y si lo tienes que pensar mucho pues piensalo y mientras ayudame a difundir este tema,sé que no conseguiré abarrotar el registro de donantes,pero si tu y yo conseguimos que uno de nosotros lo haga una persona en este mundo que con desesperación está esperando que la bondad y solidaridad de otro le salve la vida,tendrá una respuesta.Quisiera decirte que soy donante con gusto lo haría, pero no puedo tengo anemías crónicas y una salud que no es la mejor,si no fuera así con orgullo te diría que lo hice ,estoy en campaña de mejorar mi salud,y si puedo te lo cuento.
    Mientras pongamos nuestro granito de arena en esta causa,ayudemos, un día podría tocarnos una situación así ,ojalá nunca nos toque,pero aunque no sea nuestra realidad si que es la realidad de muchos,pongamos el corazón en esto difundamoslo y seamos donantes!

    Yo te informo,tu decides y por favor cuentale esto a otros así podamos llegar a todos.

    ReplyDelete
  6. Seems to me that n-POS-tags (as opposed to n-words) would be a better way of approaching a word-ordering task. Assuming you could reliably POS-tag each individual word in the test cases, which sounds non-trivial. Interesting questions.

    ReplyDelete
  7. Following up on Mark Johnson's comment, we (Regina Barzilay's group at MIT) did some recent work on applying the Mallows model to document level structuring. We have a NAACL paper on that stuff:

    http://people.csail.mit.edu/harr/papers/naacl2009.pdf

    ReplyDelete
  8. I know that this comment is a month late, but the thought just occurred to me. What about just using the likelihood ratio of the ngram model to the unigram model? That would factor out the contribution of the words themselves and leave only the contribution due to the order of the words.

    ReplyDelete
  9. 酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

    ReplyDelete
  10. I've seen people just sample negative data (often from an n-best list from a simpler process) rather than use all n! other orderings.
    Assignment Help | Coursework Help | Thesis Help

    ReplyDelete
  11. Now there may be several word orderings per POS ordering, but will that be a big problem?
    Dissertation Help | Essay Help

    ReplyDelete
  12. Hi,
    This is inspiring; I am very pleased by this post. Nice work, thanks for such information.

    Coursework help

    ReplyDelete
  13. Hi,
    Interesting topic! Hope you will elaborate more on it in future posts
    Custom Essay Writing

    ReplyDelete
  14. Hi,
    I haven’t any word to appreciate this post.....Really I am impressed from this post

    Custom Term Paper

    ReplyDelete
  15. N-gram language models is so great to use and is very sucessful. I think it is so good.
    orlando accident lawyers

    ReplyDelete
  16. Hi the ordering model with is a grates task is not easy doubt you need a lot of patients and the N words some people even know the real meaning .

    ReplyDelete
  17. This comment has been removed by the author.

    ReplyDelete
  18. We offer a great help - writing services. As result - a lot of free time!

    ReplyDelete