12 July 2016

Some picks from NAACL 2016

Usual caveats: didn't see all talks, didn't read all papers, there's lot of good stuff at NAACL that isn't listed here! That said, here are some papers I particularly liked at NAACL, with some comments. Please add comments with papers you liked!

Automatic Summarization of Student Course Feedback by Luo, Liu, Liu and Litman.

Anyone who has taught has suffered the following dilemma. You ask students for feedback throughout the course, and you have to provide free text because if you could anticipate their problems, you'd have addressed them already. But now you have hundreds of responses that you can't quickly read through. This paper provides a dataset of such responses, together with summaries that you actually have time to read through. The approach is a reasonably standard ILP formulation, with some additional machinery to deal with the fact that the data is very sparse. The idea is to essentially induce a "bigram similarity" as part of the summarization problem. I like this paper because the problem is great (I think NLP should really push in the direction of helping learners learn and teachers teach!), the dataset is nice (if a bit small) and the approach makes sense. And they actually do a human evaluation, even for a short paper! Hooray!

A Latent Variable Recurrent Neural Network for Discourse Relation Language Models by Ji, Haffari and Eisenstein.

This paper combines the nice rich hypothesis classes you get with the "anonymous" latent variables in neural networks with the modeling power that one gets from the ability to marginalize "structured" latent variables from classic graphical models land. This is applied to document language modeling, in which the latent variables are discourse relations (in the PDTB sense). The model works well both for language modeling and for predicting implicit discourse relations and dialogue acts. (And, as one should expect from a paper with Eisenstein as an author, there are statistical significance tests!)

Structured Prediction with Output Embeddings for Semantic Image Annotation by Quattoni, Ramisa, Madhyastha, Simo-Serra and Moreno-Noguer.

If you want to label images with complex structured outputs, like play(dog,grass), you quickly run in to sparsity problems on the output. The proposal here is to decompose the outputs into embeddings (kind of like Vivek's work and Jags' work) and learning a bilinear model of inputs/outputs in that space. In general, I think there's a lot to be done in interesting modeling of structured output spaces, and this paper gives a new set of techniques in this space.

Deconstructing Complex Search Tasks: A Bayesian Nonparametric Approach for Extracting Sub-tasks by Mehrotra, Bhattacharya and Yilmaz.

The problem considered here is the following: if I want to go to ACL, I need to register, book a flight, book a hotel, find some attractions to go to while skipping sessions, look up all the good vegan restaurants in Berlin (hah!), etc. My overall task is going to ACL, but there are a number of highly related but different subtasks. The challenge is to infer these subtasks from search logs, so that you can provide better search support. The model is a Chinese Restaurant Process with word embeddings used to measuring similarities. And look, another short paper with a human evaluation!

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories by Mostafazadeh, Chambers, He, Parikh, Batra, Vanderwende, Kohli and Allen.

This paper introduces the task of story cloze: given a story prefix, predict which of two endings is the "correct" (natural) one. Eg: "Jim got his first credit card in college. He didn't have a job so he bought everything on his card. After he graduated he amounted a $10,000 debt. Jim realized that he was foolish to spend so much money." Then you have to decide on the right ending: "Jim decided to device a plan for repayment." (correct) versus "Jim decided to open another credit card." (incorrect). The data set was quite well constructed (lots of checks), and a large collection of baseline models are run for comparison. The data is available. I like this paper for the task and the very very careful data set collection. Since this was establishing baselines, I would really liked to have seen error bars on the results so we can have more reasonable future comparisons. I'm really tempted to try to annotate some of these with plot units, but that was really really painful the first time around; but I feel like that's a good explanatory theory for a lot of the examples shown in the paper.

Learning Global Features for Coreference Resolution by Wiseman, Rush and Shieber.

The basic idea here is to take a "local" coreference model, that makes decisions about assigning mentions to entities, and augment it with a "global" model that models the entire cluster of mentions. In past work, cluster-level features have been hard to define (e.g., what are the right features to extract over sets of arbitrary size?). What's the solution? Essentially, embed the clusters. I like this paper because I wanted to try to do something like this and Wiseman did it better than I would have. I think there's also an interesting interpretation of this work in terms of things like neural Turing machines/memory networks, in which we now have technology for learning to update memory (where "memory" here is the "clusters"). My only gripe is that, like most papers in the new wave of coreference, the older ML approaches have been somewhat forgotten (except for Vincent Ng who is unforgettable); I'm thinking in particular about the work on ACE, like that of Luo, Ittycheriah, Jing, Kambhatla and Roukos on Bell-tree coreference, which I don't think gets enough notice these days for introducing a lot of the ideas that make up modern coref.

Visual Storytelling by Huang, Ferraro, Mostafazadeh, Misra, Agrawal, Devlin, Girshick, He, Kohli, Batra, Zitnick, Parikh, Vanderwende, Galley and Mitchell.

The main observation in this paper is that how you caption images in a sequence is different from how you caption them in isolation. For instance, if you have a temporal sequence of images from, say, a wedding, the captions for them should tell a story. This paper primarily introduces a dataset and baseline approaches for addressing the problem in this dataset. I like this paper, even if you remove the image stuff, because it emphasizes the fact that stories have a different structure than just sequences of sentences. The image stuff gives a grounding that's interesting beyond that. One problem pointed out in the paper is that automatic metrics aren't great here, which is problematic, but not particularly surprising.

No comments: