Topic modeling has turned into a bit of a cottage industry in the NLP/machine learning world. Most seems to stem from latent Dirichlet allocation, though this of course built on previous techniques; the most well-known of which is latent semantic analysis. At the end of the day, such "topic models" really look more like dimensionality reduction techniques (eg., the similarity to multinomial PCA); however, in practice, they're often used as (perhaps soft) clustering methods. Words are mapped to topics; topics are used as features; this is fed into some learning algorithm.
One thing that's interested me for a while is that when viewed as clustering algorithms, how these topic models compare with more standard word clustering algorithms from the NLP community. For instance, the Brown clustering technique (built into SRILM) that clusters words based on context. (Lots of other word clustering techniques exist, but they pretty much all cluster based on local context; where local is either positionally local or local in a syntactic tree.)
I think the general high level story is that "topic models" go for semantics while "clustering models" go for syntax. That is, clustering models will tend to cluster words together that appear in similar local context, while topic models will cluster words together that appear in a similar global context. I've even heard stories that when given a choice of using POS tags as features in a model versus Brown clusters, it really don't make a difference.
I think this sentiment is a bit unfair to clustering models. Saying that context-based clustering models only find syntactically similar words is just not true. Consider the example clusters from the original
LDA paper (the top portion of Figure 8). If we look up "film" ("new" seems odd) in
CBC, we get: movie, film, comedy, drama, musical, thriller, documentary, flick, etc. (I left out multiword entries). The LDA list contains: new, film, show, music, movie, play, musical, best, actor, etc. We never get things like "actor" or "york" (presumably this is why "new" appeared), "love" or "theater", but it's unclear if this is good or not. Perhaps with more topics, these things would have gone into separate topics.
If we look up "school", we get: hospital, school, clinic, center, laboratory, lab, library, institute, university, etc. Again, this is a different sort of list than the LDA list, which contains: school, students, schools, education, teachers, high, public, teacher, bennett, manigat, state, president, etc.
It seems like the syntactic/semantic distinction is not quite right. In some sense, with the first list, LDA is being more liberal in what it considers film-like, with CBC being more conservative. OTOH, with the "school" list, CBC seems to be more liberal.
I realize, of course, that this is comparing apples and oranges... the data sets are different, the models are different, the preprocessing is different, etc. But it's still pretty clear that both sort of models are getting at the same basic information. It would be cool to see some work that tried to get leverage from both local context and global context, but perhaps this wouldn't be especially beneficial since these approaches---at least looking at these two lists---don't seem to produce results that are strongly complementary. I've also seen questions abound regarding getting topics out of topic models that are "disjoint" in some sense...this is something CBC does automatically. Perhaps a disjoint-LDA could leverage these ideas.