I had a post a while back on teaching ML that I still basically stand by. I've taught intro grad NLP here at UMD twice now, and a sort-of-similar-course back at Utah once. I find these courses really hard to teach. And not for the usually bemoaned reason of the CS/linguistics mix -- I think it's possible to deal with that, and certainly it's an issue that's been talked about a lot.
What I find difficult is that NLP (and CL) is a collection of problems, techniques, ideas, frameworks, etc. that really are not tied together in any reasonable way other than the fact that they have to do with NLP. Even if you manage to answer questions about "what sort of topics are most interesting?" you're still faced with this problem that every time you switch topics, the entire context in which you're discussing them changes. This is exacerbated by the problem that things like tagging and parsing are hopelessly boring (in comparison to all the cool interesting stuff in NLP these days), but yet so many modern ideas are based on understanding basic dynamic programming for tree structures and things like that.
To make things a bit more concrete, a standard intro NLP class might start with morphology. Okay, so you have to explain what morphemes are and why they're important. Now, you probably will take a finite state approach, so you have to explain transducers. If you want these things to work, you have to explain weighted transducers. Do you do probabilities, in which case there's the whole local vs global normalization stuff that takes more time? So now you want to do POS tagging or something. Fine, you can do that with finite state models too. But no one actually does this any more (except lingpipe :P). So you have to explain POS stuff, perhaps how this works in non-English, and then you can leave them with HMMs (maybe talking about Viterbi algorithm) or do lots of ML so you can get to CRFs or structured perceptron or something. And we're still at POS tagging. Now you switch to parsing. Back to square one. And then you want to do compositional semantics, now there's lots more structure, lots more features and so on. But even now things are at least somewhat connected. But then you talk about lexical semantics: be it distributed representations or WSD or whatever, but the problem is new, the techniques are new (do you teach Yarowsky?), the evaluation is now and so on.
I think it's worth contrasting this with ML. I find ML remarkably easy to teach (so I'm flipping the classroom this coming Spring for the UG version to make it more exciting) despite the fact that the material is (in many ways) much harder for CS types. The thing that is nice about ML is that the problem is basically always the same (or at least changes only once, when you switch from supervised to unsupervised). In that sense, ML tends to be a course about techniques for a relatively fixed problem (or at least fixed problem type). This makes for significantly less context switching, which makes learning easier (and thereby makes teaching easier).
So the question I wanted to ask is: can we do something similar in NLP. The crazy idea that I'm sure everyone will say is insane is the following: teach NLP as a course about what you can do with log-linear models. Here's how I envision it. You spend the first day talking about NLP and why data is important, ambiguity, etc, just like normal. You spend the next two days explaining enough about log linear models that you can treat them as given for the rest of the semester. Maybe you tell how to optimize them by gradient descent or something, but basically enough that anyone who is simultaneously taking ML will get more out of it, but those that are not are fine with LL models as a black box.
Now, when you teach different topics, the framework in which you discuss them is the same. You have a structured problem (which forces you to talk about algorithms like Viterbi or CKY) with interesting ambiguities (which forces you to talk about features). Then, the class essentially becomes a sequence of problems, associated algorithms and relevant features. The rest is left as a black box, which can be provided off the shelf for programming projects, and they can focus on the interesting and more NLP-ish problems of algorithms and features. You could even start with something like sentiment classification (at a document level) to make the beginning gentle.
I realize there are some things you couldn't do this way, or would be very awkward to do this way. Anything generative or unsupervised, which often go together. For instance, word alignment via the IBM models won't fit. Topic models won't fit (though I don't usually do them anyway -- maybe I should). Probably there are some other things too.
Anyway, I'd be curious to hear what people think of this idea. I know it's biased by my own view of the world, but hey -- that's why I'm a professor (or at least why I assist professors...). Or if anyone has tried it.