A conference just ended, so it's that time of year! Here are some papers I liked with the usual caveats about recall.
Before I go to the list, let me say that I really really enjoyed ACL this year. I was completely on the fence about going, and basically decided to go only because of giving a talk at Repl4NLP, and wanted to attend the business meeting for the discussion of diversity in the ACL community, led by Joakim Nivre with an amazing report that he, Lyn Walker, Yejin Choi and Min-Yen Kan put together. (Likely I'll post more, separately, about both of these; for the latter, I tried to transcribe much of Joakim's presentation.)
All in all, I'm supremely glad I decided to go: it was probably my favorite conference in recent memory. This was not just because there were lots of great papers (there were!) but also because somehow it felt more like a large community conference than others I've attended recently. I'm not sure what made it like this, but I noticed it felt a lot less clique-y than NAACL, a lot more broad and interesting than ICML/NIPS (though that's probably because of my personal taste in research) and in general a lot friendlier. I don't know what the organizers did that managed this great combination, but it was great!
Okay, so on to the papers, sorted by ACL id.
P16-1009: Rico Sennrich; Barry Haddow; Alexandra Birch
Improving Neural Machine Translation Models with Monolingual Data
I like this paper because it has a nice solution to a problem I spent a year thinking about on-and-off and never came up with. The problem is: suppose that you're training a discriminative MT system (they're doing neural; that's essentially irrelevant). You usually have far more monolingual data than parallel data, which typically gets thrown away in neural systems because we have no idea how to incorporate it (other than as a feature, but that's blech). What they do here is, assuming you have translation systems in both directions, back translate your monolingual target-side data, and then use that faux-parallel-data to train your MT system on. Obvious question is: how much of the improvement in performance is due to language modeling versus due to some weird kind of reverse-self-training, but regardless the answer, this is a really cool (if somewhat computationally expensive) answer to a question that's been around for at least five years. Oh and it also works really well.
P16-1018: E.Dario Gutierrez; Ekaterina Shutova; Tyler Marghetis; Benjamin Bergen
Literal and Metaphorical Senses in Compositional Distributional Semantic Models
I didn't see this paper presented, but it was suggested to me at Monday's poster session. Suppose we're trying to learn representations of adjective/noun pairs, by modeling nouns as vectors and adjectives as matrices, evaluating on unseen pairs only. (Personally I don't love this style, but that's incidental to the main ideas in this paper.) This paper adjusts the adjective matrices depending on whether they're being used literally ("sweet candy") or metaphorically ("sweet dreams"). But then you can go further and posit that there's another matrix that can transform literal metaphors into metaphorical metaphors automatically, essentially implementing the Lakoff-style notion that there is great consistency in how metaphors are created.
P16-1030 [dataset]: Hannah Rashkin; Sameer Singh; Yejin Choi
Connotation Frames: A Data-Driven Investigation
This paper should win some sort of award for thoroughness. The idea is that in many frames ("The walrus pummelled the sea squirt") there is implied connotation/polarity/etc. on not only the agent (walrus) and theme (sea squirt) of the frame but also tells us something about the relationship between the writer/speaker and the agent/theme (the writer might be closer to the sea squirt in this example, versus s/pummelled/fought/). The connotation frame for pummelled collects all this information. This paper also describes an approach to prediction of these complex frames using nice structured models. Totally want to try this stuff on our old plotunits data, where we had a hard time getting even a much simpler type of representation (patient polarity verbs) to work!
P16-1152: Artem Sokolov; Julia Kreutzer; Christopher Lo; Stefan Riezler
Learning Structured Predictors from Bandit Feedback for Interactive NLP
This was perhaps my favorite paper of the conference because it's trying to do something new and hard and takes a nice approach. At a high level, suppose you're Facebook and you're trying to improve your translation system so you ask users to give 1 star to 5 star ratings. How can you use this to do better translation? This is basically the (structured) contextual bandit feedback learning problem. This paper approaches this from a dueling bandits perspective where I show you two translations and ask which is better. (Some of the authors had an earlier MT-Summit paper on the non-dueling problem which I imagine many people didn't see, but you should read it anyway.) The technical approach is basically probabilitic latent-variable models, optimized with gradient descent, with promising results. (I also like this because I've been thinking about similar structured bandit problems recently too :P.)
P16-1231: Daniel Andor; Chris Alberti; David Weiss; Aliaksei Severyn; Alessandro Presta; Kuzman Ganchev; Slav Petrov; Michael Collins
Globally Normalized Transition-Based Neural Networks
[EDIT 14 Aug 2:40p: I misunderstood from the talk and therefore the following is basically inaccurate. I'm leaving this description and paper here on the list because Yoav's comment will make no sense otherwise, but please understand that it's wrong and, I hate to say this, it does make the paper less exciting to me. The part that's wrong is struck-out below.] There's a theme in the past two years of basically repeating all the structured prediction stuff we did ten years ago on our new neural network technology. This paper is about using Collins & Roark-style incremental perceptron for transition-based dependency parsing on top of neural networks. The idea is that label-bias is perhaps still a problem for neural network dependency parsers, like their linear grandparents.
Why do I like this? Because I think a lot of neural nets people would argue that this shouldn't be necessary: the network can do arbitrarily far lookahead into the future and therefore should be able to learn to avoid the label-bias problem. This paper shows that current techniques don't achieve that: there's a consistent win to be had by doing global normalization.
P16-2013: Marina Fomicheva; Lucia Specia
Reference Bias in Monolingual Machine Translation Evaluation
This paper shows pretty definitively that human evaluations against a reference translation are super biased toward the particular reference used (probably because evaluators are lazy and are basically doing ngram matching anyway -- a story I heard from MSR friends a while back). The paper also shows that this gets worse over time, presumably as evaluators get tireder.P16-2096: Dirk Hovy; Shannon L. Spruit
The Social Impact of Natural Language Processing
This is a nice paper summarizing four issues that come up in ethics that also come up in NLP. I mostly liked this paper because it gave names to things I've thought about off and on, but didn't have a name for. In particular, they consider exclusion (hey my ASR system doesn't work on people with an accent, I guess they don't get a voice), overgeneralization (to what degree are our models effectively stereotyping more than they should), over- and under-exposure (hey lets all work on parsing because that's what everyone else is working on, which then makes parsing seem more important...just to pick a random example :P), and dual-use (I made something for good but XYZ organization used it for evil!). This is a position/discussion-starting paper, and I thought quite engaging.
Feel free to post papers you like in the comments!