At a high level, if there is one thing I want them to remember after the semester is over it's the idea of generalization and how it relates to function complexity. That's it. Now, more operationally, I'd like them to learn SVMs (and kernels) and EM for generative models.
In my opinion, the whole tenor of the class is set by how it starts. Here's how I start.
- Decision trees. No entropy. No mutual information. Just decision trees based on classification accuracy. Why? Because the point isn't to teach them decision trees. The point is to get as quickly as possible to the point where we can talk about things like generalization and function complexity. Why decision trees? Because EVERYONE gets them. They're so intuitive. And analogies to 20 questions abound. We also talk about the who notion of data being drawn from a distribution and what it means to predict well in the future.
- Nearest neighbor classifiers. No radial basis functions, no locally weighted methods, etc. Why? Because I want to introduce the idea of thinking of data as points in high dimensional space. This is a big step for a lot of people, and one that takes some getting used to. We then do k-nearest neighbor and relate it to generalization, overfitting, etc. The punch line of this section is the idea of a decision boundary and the complexity of decision boundaries.
- Linear algebra and calculus review. At this point, they're ready to see why these things matter. We've already hinted at learning as some sort of optimization (via decision trees) and data in high dimensions, hence calculus and linear algebra. Note: no real probability here.
- Linear classifiers as methods for directly optimizing a decision boundary. We start with 0-1 loss and then move to perceptron. Students love perceptron because it's so procedural.
The sad thing is that there are basically no books that follow any order even remotely like this. Except...drum roll... it's actually not far from what Mitchell's book does. Except he does kNN much later. It's really depressing how bad most machine learning books are from a pedagogical perspective... you'd think that in 12 years someone would have written something that works better.
On top of that, the most recent time I taught ML, I structured everything around recommender systems. You can actually make it all work, and it's a lot of fun. We actually did recommender systems for classes here at the U (I had about 90-odd students from AI the previous semester fill out ratings on classes they'd taken in the past). The data was a bit sparse, but I think it was a lot of fun.
The other thing I change most recently that I'm very happy with is that I have a full project on feature engineering. (It ties in to the course recommender system idea.) Why? Because most people who take ML, if they ever use it at all, will need to do this. It's maybe one of the most important things that they'll have to learn. We should try to teach it. Again, something that no one ever talks about in books.
Anyway, that's my set of tricks. If you have some that you particularly like, feel free to share!