06 July 2011

The conference(s) post: ACL and ICML

I'm using ACL/ICML as an excuse to jumpstart my resumed, hopefully regular, posting.  The usual "I didn't see/read everything" applies to all of this.  My general feeling about ACL (which was echoed by several other participants) was that the program was quite strong, but there weren't many papers that really stood out as especially great.  Here are some papers I liked and some attached thoughts, from ACL:

P11-1002 [bib]: Sujith Ravi; Kevin Knight
Deciphering Foreign LanguageThis paper is about building MT systems without parallel data.  There's been a bunch of work in this area.  The idea here is that if I have English text, I can build an English LM.  If you give me some French text and I hallucinate a F2E MT system, then it's output had better score high on the English LM.

P11-1020 [bib] [dataset]: David Chen; William Dolan
Collecting Highly Parallel Data for Paraphrase Evaluation
Although this paper is about paraphrasing, the fun part is the YouTube stuff they did.  Read it and see :).

P11-1060 [bib]: Percy Liang; Michael Jordan; Dan Klein
Learning Dependency-Based Compositional Semantics
This paper is along the lines of semantic parsing stuff that various people (Ray Mooney, Luke Zettlemoyer/Mike Collins, etc.) have been doing.  It's a nice compositional model that is learned online.

P11-1099 [bib]: Vanessa Wei Feng; Graeme Hirst
Classifying arguments by scheme
This paper is about argumentation (in the "debate" sense) and identifying different argumentation types.  There are some nice correlations with discourse theory, but in a different context.

P11-2037 [bib]: Shu Cai; David Chiang; Yoav Goldberg
Language-Independent Parsing with Empty Elements
I'm really glad to see that people are starting to take this problem seriously again.  This falls under the category of "if you've ever actually tried to use a parser to do something then you need this."

Okay so that's not that many papers, but I did "accidentally" skip some sections.  So you're on your own for the rest.

For ICML, I actually felt it was more of a mixed bag.  Here are some things that stood out as cool:

Minimum Probability Flow Learning 
Jascha Sohl-Dickstein; Peter Battaglino; Michael DeWeese
This is one that I need to actually go read, because it seems too good to be true.  If computing a partition function ever made you squirm, read this paper.

Tree-Structured Infinite Sparse Factor Model 
XianXing Zhang; David Dunson; Lawrence Carin
This is trying to do factor analysis with tree factors; they use a "multiplicative gamma process" to accomplish it. This is something we tried to do a while ago, but could never really figure out how to do it.

Sparse Additive Generative Models of Text 
Jacob Eisenstein; Amr Ahmed; Eric Xing
The idea here is that if you're learning a model of text, don't re-learn the same "general background" distribution over and over again.  Then learn class- or topic-specific stuff as a sparse amendment to that background.

OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning 
Arvind Sujeeth; HyoukJoong Lee; Kevin Brown; Tiark Rompf; Hassan Chafi; Michael Wu; Anand Atreya; Martin Odersky; Kunle Olukotun
Six more words: Most authors ever on ICML paper.

Generalized Boosting Algorithms for Convex Optimization 
Alexander Grubb; Drew Bagnell
Suppose you want to boost something that's non-smooth?  Now you can do it.  Has nice applications in imitation learning, which is I suppose why I like it.

Learning from Multiple Outlooks 
Maayan Harel; Shie Mannor
This is a nice approach based on distribution mapping to the problem of multiview learning when you don't have data with parallel views.  (I'm not sure that we need a new name for this task, but I still like the paper.)

Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Richard Socher; Cliff Chiung-Yu Lin; Andrew Ng; Chris Manning
This is basically about learning compositional semantics for vector space models of text, something that I think is really interesting and understudied (Mirella Lapata has done some stuff).  The basic idea is that if "red" is embedded at position x, and "sparrow" is embedded at y, then the embedding of the phrase "red sparrow" should be at f([x y]) where f is some neural network.  Trained to get good representations for parsing.

Please reply in comments if you had other papers you liked!!!


Justin said...

What do you think about the text generating RNNs from an NLP perspective?

Charles Sutton said...

Thanks for the pointer the Sohl-Dickstein et al paper. I wouldn't say it's too good to be true. It's kind of like contrastive divergence, except that it's based on deterministic optimization of the associated objective function, which is a nice idea. It's only practical if the MCMC kernel you're using is "sparse" in some sense, which is a bit of a limitation.