Before beginning, I'm really happy and somewhat humbled to have been able to participate in the quizbowl demo, which won the best demo award: congrats to Mohit Iyyer, He He and Jordan Boyd-Graber for doing all the hard work!
Here are some things I did see, and that I liked, or that other people pointed me to as something I would probably like but I haven't actually seen.
- Logarithmic Time Online Multiclass prediction by Anna Choromanska and John Langford. This paper has been on arxiv for a while, and has been built into vw for a while also, so I've known about this stuff for probably a year now, but I still think it's great. The task is to build, in an online way, a binary tree for doing multiclass to binary reductions, in a way that gives you nice regret guarantees. The tree changes over time. One interesting property is that a single class can me represented in multiple leaves which gives added representational capacity to common/difficult classes, something most approaches lack.
- Probabilistic Line Searches for Stochastic Optimization by Maren Mahsereci and Philipp Hennig. I didn't see this paper, but I heard from several people that they liked it. The idea is that conventional optimization (eg BFGS) gets a lot of mileage out of line search, something we don't conventionally do in stochastic optimization (like sgd), but perhaps we should. It's on my reading list.
- Grammar as a Foreign Language by Orio Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever and Geoffrey Hinton. This paper thinks of parsing as the output of a sequence-to-sequence model, basically outputting a tree as a sequence of symbols. This doesn't work if you just train it on labeled data. Instead, you have to train a state of the art parser, run it on a huge pile of unlabeled data, and then train the seq2seq model to predict that. I like this because this is what Percy, Dan and I tried to get to work in the structured compilation paper, but could never make it work (probably because we lacked a sufficiently rich hypothesis class). It's basically more evidence, IMO, that seq2seq models are really good memorizers: I would be interested to see an analysis of how much generalization is actually learned (I suspect quite little).
- The Human Kernel by Andrew Wilson, Christoph Dann, Chris Lucas and Eric Xing. I like this paper in the same way I like Jerry Zhu's "machine teaching" work. It's about the interplay between human learning and machine learning. If you haven't read this paper, it's a fun read.
- Calibrated Structured Prediction by Volodymyr Kuleshov and Percy Liang. The observation in this paper is the confidence estimation is important for structured prediction problems, but faces its own challenges. They approach it from a calibration perspective and adjust the notions of calibration to be appropriate for structured problems and marginal inference.
Overall, I enjoyed NIPS, especially catching up with old friends, doing a talking machines interview (should be posted early 2016), and seeing what's going on in a field whose size has increased dramatically since I started going eleven years ago!