A while back I came across a paper that (in a completely separate context) argues for approximating problems in lieu of approximating solutions. This idea has a nice analogue in NLP: should we (A) choose a simple model for which we can do exact inference or (B) choose a complex model that is closer to the truth for which exact inference is not tractable. (A) is approximating the problem while (B) is approximating the solution.
It seems that all signs point to (B). In almost every interesting case I know of, it helps (or at the very least doesn't hurt) to move to more complex models that are more expressive, even if this renders learning or search intractable. This story is well known in word alignment (eg, GIZA) and MT (eg, model 4 decoding), but also has simpler examples in parsing (cf, McDonald), sequence labeling (cf, Sutton), relation extraction (cf, Culotta), as well as pretty much any area in which "joint inference" has been shown to be helpful.
One sobering example here is the story in word alignment, where one cannot go and directly use, say, model 4 for computing alignments, but must first follow a strict recipe: run a few iterations of model 1, followed by a few model 2, followed by some HMM, then model 4 (skipping model 3 all together). The problem here is that learning model 4 parameters directly falls into local minima too easily, so one must initialize intelligently, by using outputs of previous iterations. My guess is that this result will continue to hold for training (though perhaps not predicting) with more an more complex models. This is unfortunate, and there may be ways of coming up with learning algorithms that automatically initialize themselves by some mechanism for simplifying their own structure (seems like a fun open question, somewhat related to recent work by Smith).
Aside from a strong suggestion as to how to design models and inference procedure (i.e., ignore tractability in favor of expressiveness), there may be something interesting to say here about human language processing. If it is indeed true that, for the most part, we can computationally move to more complex models, forgoing tractable search, then it is not implausible to imagine that perhaps humans do the same thing. My knowledge in this area is sparse, but my general understanding is that various models of human language processing are disfavored because they would be too computationally difficult. But if, as in old-school AI, we believe that humans just have a really good innate search algorithm, then this observation might lead us to believe that we have, ourselves, very complex, intractable "models" in our heads.
This is an interesting question you raised, Hal! Do humans have complex models and perform approximate inference or vice versa? The former seems more plausible to me, although I have no evidence. If you look at all sorts of human activities (playing chess, driving a car, recognizing speech), it does seem like there is a lot of variability and fuzziness in the process, which implies an approximate inference going on in the head.
ReplyDeleteHowever, just because humans do things one way doesn't imply that computer need to do the same, too! :)
This issue's been visited at book length in: _How to Solve It: Modern Heuristics_, 2004,
ReplyDeleteZbigniew Michalewicz and David B. Fogel, Springer.
The Bayesian statisticians have clearly adopted the "heuristic" approach because their models are too complex to solve analytically. Hence all the Monte Carlo methods such as Gibbs Sampling to do inference.
What's the status of using a beam? You may have a "proper" implementation of Viterbi, but prune it so the search is no longer guaranteed to be exact. Does that qualify as approximate inference or is it just corner cutting?
Kevin -- I agree: just because humans do something in one way doesn't me we should try to simulate it... OTOH, if it turns out that approximate methods ALSO work well computationally, this is somewhat interesting.
ReplyDeleteBob -- I haven't seen that book, but it reminded me of a talk I saw at NIPS04 by Gerd Gigerenzer titled "Fast and Frugal Heuristics: The Adaptive Toolbox" which I think was probably about similar things.
In general I don't draw distinctions between approximate inference techniques: sampling, variational, BP, EP, beam search, hill-climbing local search, etc. I think that this itself is an interesting other question I've been thinking about posting for a while: sampling and the various deterministic approximations have nice theoretical guarantees, but in practice, you pretty much should always just use beam search (I'm kidding... sort of).
酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花
ReplyDelete