The holidays are upon us (hence the lack of posts), but I wanted to give a brief nod to NIPS, which I thought went pretty well this year. The highlights for me (note that I didn't attend everything, so the usual qualifications apply) were:
- The Netflex Prize talk: Bennett talked about the origins of the prize and how well people are doing. They seem to firmly believe that the goal will be met. There's some indication it actually won't take all that long. Currently leading the pack at the time of the talk was wxyz, who is one of my old CMU buddies, Yi Zhang, now at UCSC. Nearly tied for first was Geoff Hinton's group up at Toronto. There weren't any details given about what people are doing to do so well, but it seems reasonable to assume that Toronto is doing something with deep belief nets (more later).
- Free Lunches (Invited talk by Dan Ariely). This was probably my favorite talk of the whole conference. The gist of the talk is that it's really easy to get humans to behave in ways that defy the standard "cost/benefit analysis" setting. A few fun examples. Split a bunch of people into two groups. Have one group write down 3 things they like about their significant other; have the other group write 10 things. Then ask them how much they love their SO. The result is that the "3 things" group loves them much more (presumably because no one can actually list 10 things). There were also a lot of examples about how often people cheat; the basic result is that it seems that knowing you cannot be caught does not necessarily make you cheat more. Also, if you prime people by having them sign an honor code, they cheat less. There were many more examples. I'm not quite positive what the take-home message was, but it was a very interesting talk.
- Analysis of Representations for Domain Adaptation (Blitzer et al). This is the first compelling analysis I've seen of the domain adaptation problem. The basic idea is to bound generalization error based on the distance between the source and target distributions. Quite clever and John says that they may even try to develop algorithms that explicitly minimize this bound.
- Boosting Structured Prediction for Imitation Learning (Ratliff et al). A cute feature-boosting algorithm for SP problems, used for the "following a map" problem.
- Large Margin Gaussian MMs for ASR (Sha, Saul). Do standard Gaussian mixture modeling for ASR, but put a large margin constraint on the Gaussians. Do a hard-EM-like thing to get fixed cluster assignments, and you can write this as a semi-definite program. Very cute.
- Greedy Layer-wise Training of Deep Networks (Bengio et al). Deep belief nets can do a really good job at some vision problems, but they're quite hard to train (symmetries, etc.). The basic idea here is to initialize the training by a simple layer-at-a-time method. The thing I found especially interesting is that their initialization looks a lot like a "predict yourself" sort of strategy, which seems to be a recurring theme in a lot of unsupervised/semi-supervised learning problem.