I'm out in Chicago at the atomic learning workshop and yesterday was structured prediction day. There were a lot of cool talks on pretty much all the major SP techniques: Charles talked about efficient training of CRFs, Yasemin talked about semi-supervised learning in M3Ns and SVMISOs, Ben talked about efficient training of M3Ns for combinatorial-style problems (like word alignment), Drew talked about planning as a SP problem, Vasin talked about the tradeoffs of local vs. global inference in SP, and I of course talked about search.
I really enjoyed Drew's talk. He is considering the problem of robot navigation, which is typically handled using A* search and some very crafty hand-written heuristic rules for generating the cost map over which A* search runs. He wants to get rid of the crafty part and replace it with learning. The idea is simple and cute: use observed features the robot receives to learn to produce a cost map so that when A* runs it finds the right path. Here, he defines the right path by having a human remote control the robot and drive it to the correct place. He casts this as a max-margin problem and solves it using very fast subgradient methods. There's no paper on this yet, but if all goes well there will be soon. I think there's a lot of stuff here that could strongly influence how we think about SP.
There were many questions raised at the workshop that I deserve significant thought. These include:
- What is structured learning?
- How does structured learning related to multi-task learning (ala Rich Caruana and others)?
- What sort of techniques are there for dealing with un- or partially-labeled data for structured learning?
- What can be done about featuritis (i.e., throwing in all possible features)? Also: Is this a legitimate concern?
- How important is it what loss function we optimize?
- When is local learning enough (ala Dan Roth) and when must you do "global learning?"
- What training techniques scale sufficiently?