I've had discussions about this with tons of people, and it seems like my approach is fairly odd. So I thought I'd blog about it because I've put a lot of thought into it over the past four offerings of the machine learning course here at Utah.

At a high level, if there is one thing I want them to remember after the semester is over it's the idea of generalization and how it relates to function complexity. That's it. Now, more operationally, I'd like them to learn SVMs (and kernels) and EM for generative models.

In my opinion, the whole tenor of the class is set by how it starts. Here's how I start.

- Decision trees. No entropy. No mutual information. Just decision trees based on classification accuracy. Why? Because the point isn't to teach them decision trees. The point is to get as quickly as possible to the point where we can talk about things like generalization and function complexity. Why decision trees? Because EVERYONE gets them. They're so intuitive. And analogies to 20 questions abound. We also talk about the who notion of data being drawn from a distribution and what it means to predict well in the future.
- Nearest neighbor classifiers. No radial basis functions, no locally weighted methods, etc. Why? Because I want to introduce the idea of thinking of data as points in high dimensional space. This is a big step for a lot of people, and one that takes some getting used to. We then do k-nearest neighbor and relate it to generalization, overfitting, etc. The punch line of this section is the idea of a decision boundary and the complexity of decision boundaries.
- Linear algebra and calculus review. At this point, they're ready to see why these things matter. We've already hinted at learning as some sort of optimization (via decision trees) and data in high dimensions, hence calculus and linear algebra. Note: no real probability here.
- Linear classifiers as methods for directly optimizing a decision boundary. We start with 0-1 loss and then move to perceptron. Students love perceptron because it's so procedural.

The sad thing is that there are basically no books that follow any order even remotely like this. Except...drum roll... it's actually not far from what Mitchell's book does. Except he does kNN much later. It's really depressing how bad most machine learning books are from a pedagogical perspective... you'd think that in 12 years someone would have written something that works better.

On top of that, the most recent time I taught ML, I structured everything around recommender systems. You can actually make it all work, and it's a lot of fun. We actually did recommender systems for classes here at the U (I had about 90-odd students from AI the previous semester fill out ratings on classes they'd taken in the past). The data was a bit sparse, but I think it was a lot of fun.

The other thing I change most recently that I'm very happy with is that I have a full project on feature engineering. (It ties in to the course recommender system idea.) Why? Because most people who take ML, if they ever use it at all, will need to do this. It's maybe one of the most important things that they'll have to learn. We should try to teach it. Again, something that no one ever talks about in books.

Anyway, that's my set of tricks. If you have some that you particularly like, feel free to share!

## 20 comments:

Hal, thanks for the great post! I really like this way of teaching, which balances nicely between practical use and theoretical reasoning.

May I ask about the programming language you recommended in the class? Have you chosen more general languages like Java, Python, or some numerical computing environments such as Matlab? I think for an undergraduate class it is a very important question. In our school it is currently done by Java+Weka. But recently I trends to believe that Matlab makes more sense and can let the student gain more insight of the algorithms. Any opinion?

Weiwei: I think weka has quite a high barrier to entry if your main objective is understanding and implementing learning algorithms; it's hard to see the point of their complicated class hierarchy before one has tried to solve many different problems. I'm partial to matlab and python+numpy because, by focusing on the linear algebra side of things, they can help students move to a more abstract understanding of what's going on. Matlab has the grat advantage of being the de facto standard for quick-and-dirty implementations found on the web, which might come in handy later in life if anyone follows a career in ml.

Weiwei: definitely not Java+Weka. I used to use matlab (for reasons that Top says), but now I use Python+NumPy. The main reason I changed was because doing feature engineering for the recommender system in matlab was really unpleasant (because it was largely text-based) and Python was much nicer.

I also hate that matlab isn't free, and spent a huge amount of time making all my matlab scripts for class Octave compatible, which was no fun. Plus plotting in Octave is less than great.

Hal, are you going to write a book on this anytime soon? Sounds like a good project while you're driving cross country this summer. Ha ha.

@Hal @Top Thanks for the tips! As you said, Matlab is handy but expensive. Besides, I guess using open source programming languages is also beneficial to the students for their future career paths: After all, not so many companies develop real-world applications with Matlab. It seems that Python+NumPy is a good way to go. I will keep that in mind. Thanks!

Very cool!

Yeah, talking about entropy and mutual information seems to waste too much time and be largely orthogonal to the class. The only advantage is that its in the book (Mitchell).

I am also not sure if all my students really "got" the importance of generalization as much as they should have. Perhaps underlining it as THE basic concept is the way to go.

As for programming: I allowed them to code in what ever language they want and it was a bit of a disaster (for one I couldn't really follow what some students were doing when they called weird libraries in languages I don't use).

I will probably require them all to learn Matlab/Octave the next time since I am very familiar with Matlab. The non-freeness of Matlab and the differences between Matlab and Octave are definitely annoying, but it still seems like the least painful option for now.

Hi! What do you think about using R for teaching machine learning? I used it for a data mining class and it fared well, but the course was oriented more towards application (data analysis), rather than implementation.

I'd also note that "statistical learning" books start with linear vs. k-NN classifiers and the discussion of your point 2, (decision trees seem more ML-ish).

Very cool!

As I come from a more NLP background to ML, I'd add also some simple MLE probabilistic "classifier" before the decision trees (i.e. "choose the most probable class"). This is very intuitive, and sets the ground for HMMs later.

Also, perceptrons can come before KNN: you can treat them feature vectors as just "feature collections" when doing perceptron and everything still work. Then discuss the vector representation, the linear classification, and that other vector-space models are possible (perhaps also touching the duality of the perceptron).

For implementation language, I definitely support the python+numpy duo. I would supplement this with IPython as a nifty interactive python shell (which also allow for easy online plotting), and the CVXOPT package for convex optimization (which make implementing SVMs a pretty easy assignment).

Hal, given this great introductory sequence, I am curious: how do you go about explaining EM?

@luk: I don't know R :).

@yoav: I *very* intentionally do NOT do probabilistic classification until later. This is because probability scares most students and they don't really get it. I remember taking AI as an undergrad and being lost with naive Bayes.

@yoav: what you say about perceptron is actually exactly what I do... we talk about perceptron as feature weights, and then ask "what does the decision boundary look like" (just like we did in kNN) and then see that it's linear, blah blah blah and then enter linear algebra.

I would have thought machine learning following "Collective intelligence" (http://amzn.to/bHSb2k). It uses Python, concise, to the point and shows just the right amount of information.

@boris

collective intelligence is a great book. But I wouldn't teach a CS ML class based on it -- not enough foundations. It would be a good candidate for teaching a more applicative data-mining / data analysis course (and still would need to supplement it with some newer stuff like SVMs, which if remember correctly are not covered)

Great ideas, Hal.

I agree with your point about feature engineering. In my NLP course I emphasize its role in the process of designing good models. All of my labs require some error analysis, and two of them require feature engineering specifically. Some students come to the course with machine learning experience, and some do not, but all seem to enjoy the idea of bringing their knowledge and insights to bear in a machine learning setting. Students also benefit from discussing questions about empiricism versus rationalism in this context.

I'd be happy to be your writing peon. I have always been interested in ML from a pedagogical perspective.

Great post! Thanks :-) I am yet to see an ML book that does not scare people. I wonder if the problem lies with the lack of adequate visualization and correlation tools. For example I would love to create a single example which I solve using the common ML techniques, demonstrate the effect of different params and explain what works/does not work here and why!

Hal,nice and interesting post...However i think that linear classifiers should be started with with the geometrical importance in view. As far as probablistic classification is concerend would it not be great to use Graphical models as it seems to make more sense ? I had attended this winter school where a introduction on HMM's using Graphical models made the point more sensible than the traditional method...Anyways great post...

Hi,

The practical use and theoretical reasoning is necessary in order to learn in a better way. In order to attract the students towards machine leaning you have to encourage them by adopting new and innovative things. You have shared good post. Thanks, keep it up.

Essay Papers

Laptop battery

ACER Laptop Battery

ASUS Laptop Battery

COMPAQ Laptop Battery

Dell Laptop Battery

HP Laptop Battery

IBM Laptop Battery

SONY Laptop Battery

TOSHIBA Laptop Battery

APPLE M8403 battery

APPLE A1078 Battery

APPLE A1079 battery

APPLE A1175 battery

APPLE a1185 battery 1

APPLE A1189 battery

Acer aspire 5920 battery

Acer btp-arj1 battery

Acer LC.BTP01.013 battery

Acer ASPIRE 1300 battery

Acer ASPIRE 1310 battery

Acer Aspire 1410 battery

Acer ASPIRE 1680 battery

ACER BTP-63D1 battery

ACER BTP-43D1 battery

Acer lc.btp05.001 battery

Acer aspire 3000 battery

Acer Travelmate 4000 battery

ACER aspire 5560 battery

ACER BATBL50L6 battery

ACER TravelMate 240 Battery

ACER BT.00803.004 Battery

ACER Travelmate 4002lmi battery

Acer travelmate 800 battery

Laptop Battery

acer Laptop Battery

apple Laptop Battery

asus Laptop Battery

compaq Laptop Battery

Dell Laptop Battery

fujitsu Laptop Battery

gateway Laptop Battery

hp Laptop Battery

ibm Laptop Battery

sony Laptop Battery

toshiba Laptop Battery

APPLE M8403 battery

APPLE A1078 Battery

APPLE A1079 battery

APPLE A1175 battery 1

APPLE a1185 battery

APPLE A1189 battery

Acer aspire 5920 battery

Acer btp-arj1 battery

Acer LC.BTP01.013 battery

Acer ASPIRE 1300 battery

Acer ASPIRE 1310 battery

Acer Aspire 1410 battery

Acer ASPIRE 1680 battery

ACER BTP-63D1 battery

ACER BTP-43D1 battery

Acer lc.btp05.001 battery

Acer aspire 3000 battery

Acer Travelmate 4000 battery

ACER aspire 5560 battery

ACER BATBL50L6 battery

ACER TravelMate 240 Battery

ACER BT.00803.004 Battery

ACER Travelmate 4002lmi battery

Acer travelmate 800 battery

Acer aspire 3613wlmi battery

Travelmate 2414wlmi battery

Acer batcl50l battery

Acer Travelmate 2300 battery

ACER aspire 3610 battery

ACER travelmate 4600 battery

Dell Latitude D800 battery

Dell Inspiron 600m battery

Dell Inspiron 8100 Battery

Dell Y9943 battery

Dell Inspiron 1521 battery

Dell Inspiron 510m battery

Dell Latitude D500 battery

Dell Latitude D520 battery

Dell GD761 battery

Dell NF343 battery

Dell D5318 battery

Dell G5260 battery

Dell Inspiron 9200 battery

Dell Latitude C500 battery

Dell HD438 Battery

Dell GK479 battery

Dell PC764 battery

Dell KD476 Battery

Dell Inspiron 1150 battery

Post a Comment