A while back I came across a paper that (in a completely separate context) argues for approximating problems in lieu of approximating solutions. This idea has a nice analogue in NLP: should we (A) choose a simple model for which we can do exact inference or (B) choose a complex model that is closer to the truth for which exact inference is not tractable. (A) is approximating the problem while (B) is approximating the solution.
It seems that all signs point to (B). In almost every interesting case I know of, it helps (or at the very least doesn't hurt) to move to more complex models that are more expressive, even if this renders learning or search intractable. This story is well known in word alignment (eg, GIZA) and MT (eg, model 4 decoding), but also has simpler examples in parsing (cf, McDonald), sequence labeling (cf, Sutton), relation extraction (cf, Culotta), as well as pretty much any area in which "joint inference" has been shown to be helpful.
One sobering example here is the story in word alignment, where one cannot go and directly use, say, model 4 for computing alignments, but must first follow a strict recipe: run a few iterations of model 1, followed by a few model 2, followed by some HMM, then model 4 (skipping model 3 all together). The problem here is that learning model 4 parameters directly falls into local minima too easily, so one must initialize intelligently, by using outputs of previous iterations. My guess is that this result will continue to hold for training (though perhaps not predicting) with more an more complex models. This is unfortunate, and there may be ways of coming up with learning algorithms that automatically initialize themselves by some mechanism for simplifying their own structure (seems like a fun open question, somewhat related to recent work by Smith).
Aside from a strong suggestion as to how to design models and inference procedure (i.e., ignore tractability in favor of expressiveness), there may be something interesting to say here about human language processing. If it is indeed true that, for the most part, we can computationally move to more complex models, forgoing tractable search, then it is not implausible to imagine that perhaps humans do the same thing. My knowledge in this area is sparse, but my general understanding is that various models of human language processing are disfavored because they would be too computationally difficult. But if, as in old-school AI, we believe that humans just have a really good innate search algorithm, then this observation might lead us to believe that we have, ourselves, very complex, intractable "models" in our heads.