I'm out in Chicago at the atomic learning workshop and yesterday was structured prediction day. There were a lot of cool talks on pretty much all the major SP techniques: Charles talked about efficient training of CRFs, Yasemin talked about semi-supervised learning in M3Ns and SVMISOs, Ben talked about efficient training of M3Ns for combinatorial-style problems (like word alignment), Drew talked about planning as a SP problem, Vasin talked about the tradeoffs of local vs. global inference in SP, and I of course talked about search.
I really enjoyed Drew's talk. He is considering the problem of robot navigation, which is typically handled using A* search and some very crafty hand-written heuristic rules for generating the cost map over which A* search runs. He wants to get rid of the crafty part and replace it with learning. The idea is simple and cute: use observed features the robot receives to learn to produce a cost map so that when A* runs it finds the right path. Here, he defines the right path by having a human remote control the robot and drive it to the correct place. He casts this as a max-margin problem and solves it using very fast subgradient methods. There's no paper on this yet, but if all goes well there will be soon. I think there's a lot of stuff here that could strongly influence how we think about SP.
There were many questions raised at the workshop that I deserve significant thought. These include:
- What is structured learning?
- How does structured learning related to multi-task learning (ala Rich Caruana and others)?
- What sort of techniques are there for dealing with un- or partially-labeled data for structured learning?
- What can be done about featuritis (i.e., throwing in all possible features)? Also: Is this a legitimate concern?
- How important is it what loss function we optimize?
- When is local learning enough (ala Dan Roth) and when must you do "global learning?"
- What training techniques scale sufficiently?
2 comments:
I started a separate post on heuristics to discuss this. Please reply there.
I don't think that structured learning is such a complicated approach--people have worked on prediction problems with structured outputs for years. For example, speech recognition (predicting a sequence of words), communication theory (decoding streams of bits), etc. are all so-called "structured learning" problems. It's just that recently the term has been coined and people looked at things in a more general light.
Hal, regarding your post on the TTI atomic learning workshop--I'd appreciate if you could comment on each of the issues mentioned when you have time. :)
Post a Comment