12 April 2010

How I teach machine learning

I've had discussions about this with tons of people, and it seems like my approach is fairly odd. So I thought I'd blog about it because I've put a lot of thought into it over the past four offerings of the machine learning course here at Utah.

At a high level, if there is one thing I want them to remember after the semester is over it's the idea of generalization and how it relates to function complexity. That's it. Now, more operationally, I'd like them to learn SVMs (and kernels) and EM for generative models.

In my opinion, the whole tenor of the class is set by how it starts. Here's how I start.
  1. Decision trees. No entropy. No mutual information. Just decision trees based on classification accuracy. Why? Because the point isn't to teach them decision trees. The point is to get as quickly as possible to the point where we can talk about things like generalization and function complexity. Why decision trees? Because EVERYONE gets them. They're so intuitive. And analogies to 20 questions abound. We also talk about the who notion of data being drawn from a distribution and what it means to predict well in the future.

  2. Nearest neighbor classifiers. No radial basis functions, no locally weighted methods, etc. Why? Because I want to introduce the idea of thinking of data as points in high dimensional space. This is a big step for a lot of people, and one that takes some getting used to. We then do k-nearest neighbor and relate it to generalization, overfitting, etc. The punch line of this section is the idea of a decision boundary and the complexity of decision boundaries.

  3. Linear algebra and calculus review. At this point, they're ready to see why these things matter. We've already hinted at learning as some sort of optimization (via decision trees) and data in high dimensions, hence calculus and linear algebra. Note: no real probability here.

  4. Linear classifiers as methods for directly optimizing a decision boundary. We start with 0-1 loss and then move to perceptron. Students love perceptron because it's so procedural.
The rest follows mostly as almost every other machine learning course out there. But IMO these first four days are crucial. I've tried (in the past) starting with linear regression or linear classification and it's just a disaster. You spend too much time talking about unimportant stuff. The intro with error-based decision trees moving to kNN is amazingly useful.

The sad thing is that there are basically no books that follow any order even remotely like this. Except...drum roll... it's actually not far from what Mitchell's book does. Except he does kNN much later. It's really depressing how bad most machine learning books are from a pedagogical perspective... you'd think that in 12 years someone would have written something that works better.

On top of that, the most recent time I taught ML, I structured everything around recommender systems. You can actually make it all work, and it's a lot of fun. We actually did recommender systems for classes here at the U (I had about 90-odd students from AI the previous semester fill out ratings on classes they'd taken in the past). The data was a bit sparse, but I think it was a lot of fun.

The other thing I change most recently that I'm very happy with is that I have a full project on feature engineering. (It ties in to the course recommender system idea.) Why? Because most people who take ML, if they ever use it at all, will need to do this. It's maybe one of the most important things that they'll have to learn. We should try to teach it. Again, something that no one ever talks about in books.

Anyway, that's my set of tricks. If you have some that you particularly like, feel free to share!


  1. Hal, thanks for the great post! I really like this way of teaching, which balances nicely between practical use and theoretical reasoning.

    May I ask about the programming language you recommended in the class? Have you chosen more general languages like Java, Python, or some numerical computing environments such as Matlab? I think for an undergraduate class it is a very important question. In our school it is currently done by Java+Weka. But recently I trends to believe that Matlab makes more sense and can let the student gain more insight of the algorithms. Any opinion?

  2. Weiwei: I think weka has quite a high barrier to entry if your main objective is understanding and implementing learning algorithms; it's hard to see the point of their complicated class hierarchy before one has tried to solve many different problems. I'm partial to matlab and python+numpy because, by focusing on the linear algebra side of things, they can help students move to a more abstract understanding of what's going on. Matlab has the grat advantage of being the de facto standard for quick-and-dirty implementations found on the web, which might come in handy later in life if anyone follows a career in ml.

  3. Weiwei: definitely not Java+Weka. I used to use matlab (for reasons that Top says), but now I use Python+NumPy. The main reason I changed was because doing feature engineering for the recommender system in matlab was really unpleasant (because it was largely text-based) and Python was much nicer.

  4. I also hate that matlab isn't free, and spent a huge amount of time making all my matlab scripts for class Octave compatible, which was no fun. Plus plotting in Octave is less than great.

  5. Hal, are you going to write a book on this anytime soon? Sounds like a good project while you're driving cross country this summer. Ha ha.

  6. @Hal @Top Thanks for the tips! As you said, Matlab is handy but expensive. Besides, I guess using open source programming languages is also beneficial to the students for their future career paths: After all, not so many companies develop real-world applications with Matlab. It seems that Python+NumPy is a good way to go. I will keep that in mind. Thanks!

  7. Very cool!

    Yeah, talking about entropy and mutual information seems to waste too much time and be largely orthogonal to the class. The only advantage is that its in the book (Mitchell).

    I am also not sure if all my students really "got" the importance of generalization as much as they should have. Perhaps underlining it as THE basic concept is the way to go.

    As for programming: I allowed them to code in what ever language they want and it was a bit of a disaster (for one I couldn't really follow what some students were doing when they called weird libraries in languages I don't use).

    I will probably require them all to learn Matlab/Octave the next time since I am very familiar with Matlab. The non-freeness of Matlab and the differences between Matlab and Octave are definitely annoying, but it still seems like the least painful option for now.

  8. Hi! What do you think about using R for teaching machine learning? I used it for a data mining class and it fared well, but the course was oriented more towards application (data analysis), rather than implementation.

    I'd also note that "statistical learning" books start with linear vs. k-NN classifiers and the discussion of your point 2, (decision trees seem more ML-ish).

  9. Very cool!

    As I come from a more NLP background to ML, I'd add also some simple MLE probabilistic "classifier" before the decision trees (i.e. "choose the most probable class"). This is very intuitive, and sets the ground for HMMs later.

    Also, perceptrons can come before KNN: you can treat them feature vectors as just "feature collections" when doing perceptron and everything still work. Then discuss the vector representation, the linear classification, and that other vector-space models are possible (perhaps also touching the duality of the perceptron).

    For implementation language, I definitely support the python+numpy duo. I would supplement this with IPython as a nifty interactive python shell (which also allow for easy online plotting), and the CVXOPT package for convex optimization (which make implementing SVMs a pretty easy assignment).

    Hal, given this great introductory sequence, I am curious: how do you go about explaining EM?

  10. @luk: I don't know R :).

    @yoav: I *very* intentionally do NOT do probabilistic classification until later. This is because probability scares most students and they don't really get it. I remember taking AI as an undergrad and being lost with naive Bayes.

    @yoav: what you say about perceptron is actually exactly what I do... we talk about perceptron as feature weights, and then ask "what does the decision boundary look like" (just like we did in kNN) and then see that it's linear, blah blah blah and then enter linear algebra.

  11. I would have thought machine learning following "Collective intelligence" (http://amzn.to/bHSb2k). It uses Python, concise, to the point and shows just the right amount of information.

  12. @boris

    collective intelligence is a great book. But I wouldn't teach a CS ML class based on it -- not enough foundations. It would be a good candidate for teaching a more applicative data-mining / data analysis course (and still would need to supplement it with some newer stuff like SVMs, which if remember correctly are not covered)

  13. Nice post, thanks for sharing this wonderful and useful information with us.

    Green Tea

  14. Great ideas, Hal.

    I agree with your point about feature engineering. In my NLP course I emphasize its role in the process of designing good models. All of my labs require some error analysis, and two of them require feature engineering specifically. Some students come to the course with machine learning experience, and some do not, but all seem to enjoy the idea of bringing their knowledge and insights to bear in a machine learning setting. Students also benefit from discussing questions about empiricism versus rationalism in this context.

  15. Great idea! Love seeing a creative mind work and gain success!!!!!! Hope it continues to grow!


  16. I'd be happy to be your writing peon. I have always been interested in ML from a pedagogical perspective.

  17. Great post! Thanks :-) I am yet to see an ML book that does not scare people. I wonder if the problem lies with the lack of adequate visualization and correlation tools. For example I would love to create a single example which I solve using the common ML techniques, demonstrate the effect of different params and explain what works/does not work here and why!

  18. Hal,nice and interesting post...However i think that linear classifiers should be started with with the geometrical importance in view. As far as probablistic classification is concerend would it not be great to use Graphical models as it seems to make more sense ? I had attended this winter school where a introduction on HMM's using Graphical models made the point more sensible than the traditional method...Anyways great post...

  19. Fashion trends change on daily basis, like Gold GHD. Following the latest in designer shades has become a passion of everyone, now Burberry Sunglasses. If you are the type of a woman who loves to explore in fashion, our ED Hardy Sunglasses will definitely satisfy your taste. Cheap Ed Hardy Sunglasses is also OK. Ed hardy streak of clothing is expanded into its wholesale ED Hardy T-shirt chain so that a large number of fans and users can enjoy the cheap ED Hardy Clothing range easily with the help of numerous secured websites, actually, our ED Hardy Outlet. As we all know, in fact discount ED Hardy, is based on the creations of the world renowned tattoo artist Don Ed Hardy. Well, this question is bound to strike the minds of all individuals. Many people may say Prada shoes is a joke, but we can give you Prada Sunglasses, because we have Prada handbags. Almost everyone will agree that Prada Purses are some of the most beautiful designer handbags marketed today. Now we have one new product: Prada totes. The reason is simple: fashion prohibited by ugg boots, in other words, we can say it as Cheap ugg boots. Would you like to wear Discount ugg boots. We have two kinds of fashionable boots: classic ugg boots and ugg classic tall boots. Ankh Royalty--the Cultural Revolution. Straightens out the collar, the epaulette epaulet, the Ankh Royalty Clothing two-row buckle. Now welcome to our Ankh Royalty Outlet. And these are different products that bear the most famous names in the world of fashion, like Ankh Royalty T shirt by the way -Prada, Spyder, Moncler(Moncler jackets,or you can say Moncler coats, Moncler T-shirt, Moncler vest,and you can buy them from our discount Moncler outlet), GHD, ED Hardy, Ankh Royalty, Twisted Heart.

  20. Fashion is all about lookingmoncler jackets good and being comfortable at the same time. monclermoncler coatsEvery woman loves being complimentedmoncler t-shirtmoncler vestmoncler outlet for her sense of style and elegance. ugg bootscheap ugg bootsdiscount ugg bootsNo woman wants to be labeled as outdated and old fashioned.ugg classic tall bootsclassic ugg bootsWomen are expected to be in constant touch with the latest trends in ordermoncler jacketsmonclernew moncler coats to adopt those trends and look fashionable and stylish. moncler vestmoncler outletmoncler polo t-shirtWomen need to make sure that whatever they wear, coach outletcoach handbagscoach bagsit should compliment their personality and they do not feel awkward while wearing it. coach totescoach outletlouis vuitton handbagsAn article about Gothic costume clothing and the importance of choosing Gothic choker. LV handbags 2010Louis Vuitton bagsLouis Vuitton totesThis article explores the wide variety of Gothicrain boots
    rainweardiesel jeans chokers available from everyday to special occasions. Ture Religion Jeanslevis jeansabercrombie and fitch outletIt should inspire people to go shopping for their own Gothic chokers. ed hardy wholesaleed hardy outletcheap ed hardy wholesaleThe origins of the Flat cap span discount ed hardy wholesalewholesale ed hardy

  21. Fashion trends change on daily basis, like Gold GHD. Now Burberry Sunglasses.Our ED Hardy Sunglasses will definitely satisfy your taste. ED Hardy T-shirt chain makes people enjoy the cheap ED Hardy Clothing range easily our ED Hardy Outlet. As we all know, in fact discount ED Hardy, is based. The reason is simple: fashion prohibited by polo boots, in other words, we can say it as polo shoes. Would you like to wear cheap ugg boots? Now welcome to our paul smith outlet. And these are different products, like Puma Shoes (or you can say Puma Sneaker, and you can buy them from our puma outlet), GHD, ED Hardy, UGG, Paul Smith and brand Sunglasses. For those who desire for a ugg boots but have to refrain themselves from buying one ugg boots, there are some company that offer discount ugg boots.

  22. Hi,
    The practical use and theoretical reasoning is necessary in order to learn in a better way. In order to attract the students towards machine leaning you have to encourage them by adopting new and innovative things. You have shared good post. Thanks, keep it up.

    Essay Papers

  23. We offer the farouk chi flat iron. We provide the best price and free shipping for all the
    chi flat iron. As we know, the
    ghd iv styler is the first class and famous brand. So it is the good chance for you. Don't let it pass. If you are looking for the
    babyliss flat iron, you have come to the right place for
    instyler rotating hot iron.

    ghd straighteners was known as
    ghd flat iron, which was authorized online
    ghd seller provides all kinds of hair straighteners,pink ghd,purple ghd,babyliss. By visiting
    ghd iv salon styler , you will find what you want and made yourself more beautiful.If you miss it ,you miss beauty.Buy a piece of ghd for yourself.Come and join us
    http://www.ghdhairs.com/ to win the
    ghd iv mini styler.
    ghd uk
    ghd australia
    ghd africa
    ghd southafrica
    t3 hair dryer
    purple ghd straighteners
    ghd spain
    ghd ireland
    ghd denmark
    ghd america
    ghd italy
    ghd germany
    ghd france
    cheap ghd
    purple ghd straighteners

  24. Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

    sesli sohbet
    sesli chat
    sesli sohbet sitesi
    sesli chat sitesi
    sesli sohpet
    kamerali sohbet
    kamerali chat
    webcam sohbet