I recently co-organized a BayesNLP workshop with Yee Whye. Here's a brief summary of a subset of the talks and the discussions that took place.
Topic models and language modeling: A lot of the discussion and papers were about either TMs, LMs or both. Much of the discussion of topic models was how to introduce Markov-style dependencies. There are essentially three ways to do this: (1) make word i dependent on topic i and word (i-1); (2) make topic i dependent on topic (i-1); (3) both. Basically this comes out in how you structure the "beta" language models (in LDA terminology). There is a trade-off between number of params (|vocab| * (# top)^2 versus |vocab|^2 * (# top)) and the ability to fit data. I think the general consensus is that if you have a lot of data (which you should!) then you should use the more expressive models.
The major problem with these models is that they often are evaluated by their perplexity on test data. These perplexities are significantly higher than those obtained by people in the speech community, which raises the "why should I care question" (see this other entry). There are several potential answers: (1) topics can be embeded in a task (say MT) and this leads to better performance; (2) topics are used to enable new tasks (browsing Science repositories); (3) topics can be compared with what humans do in a CogSci manner.
This topic lead into some incomplete discussion on what sorts of problems we might want to work on in the future. I don't think there was a solid decision made. In terms of what applications might be interesting, I think the agreement was that Bayesian techniques are most useful in problems for which there is insufficient data to fit all parameters well. Since "there's no data like more data" has become a mantra in NLP, this seems like it would include every problem! My opinion is that Bayesian methods will turn out to be most useful for largely unsupervised tasks, where my prior knowledge can be encoded as structure. I think there's lots of room to grow into new application domains (similar to some stuff Andrew McCallum has been working on in social network analysis). Introducing new tasks makes evaluation difficult which can make publication difficult (your eight pages have to go both to technique an evaluation), but I think it's the right way for the community to head.
I also really like Yee Whye's talk (which happened to propose basically the same model as a paper by Goldwater, Griffiths and Johnson at this same NIPS), where he basically gave an interpretation of KN smoothing as a nonparametric Bayesian model with a Poisson-Dirichlet prior. Unlike previous methods to explain why KN works, this actually give superior results to interpolated KN (though it loses to modified interpolated KN). Shaojun talked about integrating a whole bunch of stuff (Markov models, grammars and topics) into a language model using directed Markov fields as an "interface" language. This was really cute and they seem to be doing really well (going against the above comment that it's hard to get comparable perplexities). I believe there's an upcoming CL paper on this topic.
If anyone else took part in the BNLP workshop and would like to comment, you're more than welcome.
Talks
3 hours ago

6 comments:
Why do topic models have higher perplexity? Is it simply because it doesn't incorporate n-order Markov dependencies among words, or you think there are other reasons?
I feel like the topic models are so great for things like visualization, knowledge discovery, and clustering. There must be some way to use it successfully as a language model.
I think one reason is the lack of Markov dependencies. And we've seen repeatedly that simple interpolation of word cluster models and standard n-gram models doesn't seem to help much, making the "easy road out" unattractive. But I think that, vis-a-vis LM, they're at a fundamental disadvantage to the word cluster models that are more common in NLP (Brown-style clustering). Topic models find good clusters of words based on global document features. NLP-cluster models find them based on local word features. So I'd expect a topic model to put "happy" and "happiness" in the same topic, but an n-gram cluster model to put "bought" and "built" in the same cluster. I think that, for the purpose of perplexity reduction, the latter is better (basic smoothing). This is why I think new applications might be the way to go.
The workshop was extremely interesting for me (thanks for organizers!).
But because models and techniques in Bayesian NLP are becoming increasingly intricate as presented at the workshop, I'm afraid that there would be a severe discrepancy between the ordinary NLP researchers and more machine-learning oriented researchers like the participants of this workshop.
I clearly remember that McCallum described this situation that Bayesian NLP methods are becoming "esoteric."
Of course, I'm enjoying recent progresses very much, and have no doubt about these advancements.
However, in order to bridge the potential discrepancy between the two groups (and to replace argmax! :-)), I think it is also very important spread the knowledge, only the ideas if any, to wider audiences in natural language processing.
I hope this workshop will be held again in the future.
welcome to the wow power leveling cheap wow power leveling service site, buy cheap wow gold,wotlk gold,world of warcraft power leveling buy wow gold
網頁設計,情趣用品,情趣用品,情趣用品,情趣用品
色情遊戲,寄情築園小遊戲,情色文學,一葉情貼圖片區,情惑用品性易購,情人視訊網,辣妹視訊,情色交友,成人論壇,情色論壇,愛情公寓,情色,舊情人,情色貼圖,色情聊天室,色情小說,做愛,做愛影片,性愛
免費視訊聊天室,aio交友愛情館,愛情公寓,一葉情貼圖片區,情色貼圖,情色文學,色情聊天室,情色小說,情色電影,情色論壇,成人論壇,辣妹視訊,視訊聊天室,情色視訊,免費視訊,免費視訊聊天,視訊交友網,視訊聊天室,視訊美女,視訊交友,視訊交友90739,UT聊天室,聊天室,豆豆聊天室,尋夢園聊天室,聊天室尋夢園,080聊天室,080苗栗人聊天室,女同志聊天室,上班族聊天室,小高聊天室
AV,AV女優
視訊,影音視訊聊天室,視訊交友
視訊,影音視訊聊天室,視訊聊天室,視訊交友,視訊聊天,視訊美女
酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花
Post a Comment