natural language processing blog: NAACL 2010 Retrospective

I just returned from NAACL 2010, which was simultaneously located in my home town of Los Angeles and located nowhere near my home town of Los Angeles. (That's me trying to deride downtown LA as being nothing like real LA.)

Overall I was pleased with the program. I saw a few talks that changed (a bit) how I think about some problems. There were only one or two talks I saw that made me wonder how "that paper" got in, which I think is an acceptable level. Of course I spend a great deal of time not at talks, but no longer feel bad about doing so.

On tutorials day, I saw Hoifung Poon's tutorial on Markov Logic Networks. I think Hoifung did a great job of targetting the tutorial at just the right audience, which probably wasn't exactly me (though I still quite enjoyed it myself). I won't try to describe MLNs, but my very brief summary is "language for compactly expressing complex factor graphs (or CRFs, if you prefer)." That's not exactly right, but I think it's pretty close. You can check back in a few months and see if there are going to be any upcoming "X, Y and Daume, 2011" papers using MLNs :P. At any rate, I think it's a topic worth knowing about, especially if you really just want to get a system up and running quickly. (I'm also interested in trying Andrew McCallum's Factorie system, which, to some degree, trades easy of use for added functionality. But honestly, I don't really have time to just try things these days: students have to do that for me.)

One of my favorite papers of the conference was one that I hadn't even planned to go see! It is Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables by Tetsuji Nakagawa, Kentaro Inui and Sadao Kurohashi. (I saw it basically because by the end of the conference, I was too lazy to switch rooms after the prvious talk.) There are two things I really like about this paper. The first is that the type of sentiment they're going after is really broad. Example sentences included things that I'd love to look up, but apparently were only in the slides... but definitely more than "I love this movie." The example in the paper is "Tylenol prevents cancer," which is a nice positive case.

The basic idea is that some words give you sentiment. For instance, by itself, "cancer" is probably negative. But then some words flip polarity. Like "prevents." Or negation. Or other things. They set up a model based on sentence level annotations with latent variables for the "polarity" words and for the "flipping" words. The flipping words are allowed to flip any sentiment below them in the dependency tree. Cool idea! Of course, I have to nit-pick the paper a bit. It probably would be better to allow arguments/adjuncts to flip polarity, too. Otherwise, negation (which is usually a leaf) will never flip anything. And adjectives/adverbs can't flip either (eg., going from "happy" to "barely happy"). But overall I liked the paper.

A second thing I learned is that XOR problems do exist in real life, which I had previously questioned. The answer came (pretty much unintentionally) from the paper The viability of web-derived polarity lexicons by Leonid Velikovich, Sasha Blair-Goldensohn, Kerry Hannan and Ryan McDonald. I won't talk much about this paper other than to say that if you have 4 billion web pages, you can get some pretty good sentimenty words, if you're careful to not blindly apply graph propagation. But at the end, they throw a meta classifier on the polarity classification task, whose features include things like (1) how many positive terms are in the text, (2) how many negative terms are in the text, (3) how many negations are in the text. Voila! XOR! (Because negation XORs terms.)

I truly enjoyed Owen Rambow's poster on The Simple Truth about Dependency and Phrase Structure Representations: An Opinion Piece. If you're ever taken a class in mathematical logic, it is very easy for me to summarize this paper: parse trees (dependency or phrase structure) are your languge, but unless you have a theory of that language (in the model-theoretic sense) then whatever you do is meaningless. In more lay terms: you can always push symbols around, but unless you tie a semantics to those symbols, you're really not doing anything. Take home message: pay attention to the meaning of your symbols!

In the category of "things everyone should know about", there was Painless unsupervised learning with features by Taylor Berg-Kirkpatrick, Alexandre Bouchard Côté, John DeNero and Dan Klein. The idea is that you can replace your multinomails in an HMM (or other graphical model) with little maxent models. Do EM in this for unsuperviesd learning and you can throw in a bunch of extra features. I would have liked to have seen a comparison against naive Bayes with the extra features, but my prior belief is sufficiently strong that I'm willing to believe that it's helpful. The only sucky thing about this training regime is that training maxent models with (tens of) thousands of classes is pretty painful. Perhaps a reduction like tournaments or SECOC would help bring it down to a log factor.

I didn't see the presentation for From baby steps to leapfrog: How "Less is More" in unsupervised dependency parsing by Valetin Spitkovsky, Hiyan Alshawi and Dan Jurafsky, but I read it. The idea is that you can do better unsupervised dependency parsing by giving your learner progressively harder examples. I really really really tried to get something like this to work for unsearn, but nothing helped and most things hurn. (I only tried adding progressively longer sentences: other ideas, based on conversations with other folks, include looking at vocabulary size, part of speech (eg., human babies learn words in a particular order), etc.) I'm thrilled it actually works.

Again, I didn't see Discriminative Learning over Constrained Latent Representations by Ming-Wei Chang, Dan Goldwasser, Dan Roth and Vivek Srikumar, but I learned about the work when I visited UIUC recently (thanks again for the invitation, Dan R.!). This paper does exactly what you would guess from the title: learns good discriminative models when you have complex latent structures that you know something about a priori.

I usually ask people at the end of conferences what papers they liked. Here are some papers that were spoken highly of by my fellow NAACLers. (This list is almost unadulterated: one person actually nominated one of the papers I thought shouldn't have gotten in, so I've left it off the list. Other than that, I think I've included everything that was specifically mentioned to me.)

Optimal Parsing Strategies for Linear Context-Free Rewriting Systems by Daniel Gildea.
Products of Random Latent Variable Grammars by Slav Petrov.
Joint Parsing and Alignment with Weakly Synchronized Grammars by David Burkett, John Blitzer and Dan Klein.
For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia by Mark Yatskar, Bo Pang, Cristian Danescu-Niculescu-Mizil and Lillian Lee.
Type-Based MCMC by Percy Liang, Michael I. Jordan and Dan Klein.

I think I probably have two high level "complaints" about the program this year. First, I feel like we're seeing more and more "I downloaded blah blah blah data and trained a model using entirely standard features to predict something and it kind of worked" papers. I apologize if I've just described your paper, but these papers really rub me the wrong way. I feel like I just don't learn anything from them: we already know that machine learning works surprisingly well and I don't really need more evidence of that. Now, if my sentence described your paper, but your paper additionally had a really interesting analysis that helps us understand something about language, then you rock! Second, I saw a lot of presentations were speakers were somewhat embarassingly unaware of very prominent very relevant prior work. (In none of these cases was the prior work my own: it was work that's much more famous.) Sometimes the papers were cited (and it was more of a "why didn't you compare against that" issue) but very frequently they were not. Obviously not everyone knows about all papers, but I recognized this even for papers that aren't even close to my area.

Okay, I just ranted, so let's end on a positive note. I'm leaving the conference knowing more than when I went, and I had fun at the same time. Often we complain about the obvious type I errors and not-so-obvious type II errors, but overall I found the program strong. Many thanks to the entire program committee for putting together an on-average very good set of papers, and many thanks to all of you for writing these papers!

10 comments:

Ryan McDonald07 June, 2010 20:56
Thanks for the retrospective Hal. I had a lot of fun at NAACL this year. It was really well organized. Kudos to the local organizers. Unfortunately there were way too many parallel sessions and not enough posters, IMHO.

I too enjoyed the dependency tree sentiment paper by Nakagawa et al. I suspect that it should be possible to extend their model to handle the case when modifiers are the negators.

For the Berg-Kirkpatrick et al. paper I still don't understand why the direct gradient is SO much better empirically. They say that the likelihood is about the same in practice. Normally I would just say they got lucky, but the fact that they observe it on so many different tasks suggests that something is going on here. I think I may have already brought this up with Hal in L.A., but if anyone has any thoughts about this I would love to hear them.

Another paper I liked was "Bayesian Inference for Finite-State Transducers" by Chiang et al. I would like to understand better why their averaging trick for model selection works amazingly well.
hal07 June, 2010 21:37
Ryan: For the Berg-Kirkpatrick paper, I want to know that, too. It seems like magic. Maybe they should talk to their next-door-neighbor, Percy Liang, who is the expert and understanding the errors that unsupervised models make :). More seriously, though, I kind of suspect we might never get to know, given that the likelihoods are the same. That is, I've thought about it a bunch and I can't come up with any concrete hypothesis, much less one that's testable. If the likelihoods were better as well, I could formulate some stuff to test.
Jochen L. Leidner09 June, 2010 00:11
I fully agree that more and more people are either not aware of related work/prior art or know about it but choose to ignore the topic, and it also bugs me.

Last year, I decided to give this more weight when reviewing submissions as a result. Every author should recognize and acknowledge the most closely related paper they know of, and the first work or early seminal work in the area, as well as the most recent and best results they know. Furthermore, authors should demonstrate they are aware of what their own contribution is, i.e. how they are different.
Mark Dredze09 June, 2010 19:10
Excellent summary. I had the same comments about the CRF sentiment paper. I found myself sitting in the talk unplanned and I was very pleased. I had actually come up with a very similar model with a student earlier in the semester and it was/is on our TODO list. I'm glad to see that it works!
tberg12 June, 2010 15:10
Great NAACL summary, Hal! We've got a hypothesis about why the direct gradient approach outperforms EM. It has to do with coarse features and how they're used during the initial stages of learning.

I wrote up a summary of the hypothesis and some new results that back it up, posted here. Come check it out, and let me know what you think.
Anonymous13 June, 2010 22:24
There ain't no reliability in XOR.
Jochen L. Leidner15 June, 2010 17:48
Another HLT-NAACL report.
Chintoz22 June, 2010 07:40
Thanks for this summary about NAACL. I haven´t known anything about thats conferences.
Now I'll look over the papers. This NLP area is very interesting, specially the topic of sense in a document. It´s fascinating.
combattery8405 July, 2010 00:00
Laptop battery
ACER Laptop Battery
ASUS Laptop Battery
COMPAQ Laptop Battery
Dell Laptop Battery
HP Laptop Battery
IBM Laptop Battery
SONY Laptop Battery
TOSHIBA Laptop Battery
APPLE M8403 battery
APPLE A1078 Battery
APPLE A1079 battery
APPLE A1175 battery
APPLE a1185 battery 1
APPLE A1189 battery
Acer aspire 5920 battery
Acer btp-arj1 battery
Acer LC.BTP01.013 battery

Acer ASPIRE 1300 battery
Acer ASPIRE 1310 battery
Acer Aspire 1410 battery
Acer ASPIRE 1680 battery
ACER BTP-63D1 battery
ACER BTP-43D1 battery
Acer lc.btp05.001 battery
Acer aspire 3000 battery
Acer Travelmate 4000 battery
ACER aspire 5560 battery
ACER BATBL50L6 battery
ACER TravelMate 240 Battery
ACER BT.00803.004 Battery
ACER Travelmate 4002lmi battery
Acer travelmate 800 battery
combattery8407 July, 2010 00:16
ACER Travelmate 4002lmi battery
Acer travelmate 800 battery
Acer aspire 3613wlmi battery
Travelmate 2414wlmi battery
Acer batcl50l battery
Acer Travelmate 2300 battery
ACER aspire 3610 battery
ACER travelmate 4600 battery
Dell Latitude D800 battery
Dell Inspiron 600m battery
Dell Inspiron 8100 Battery
Dell Y9943 battery
Dell Inspiron 1521 battery
Dell Inspiron 510m battery
Dell Latitude D500 battery
Dell Latitude D520 battery
Dell GD761 battery
Dell NF343 battery
Dell D5318 battery
Dell G5260 battery
Dell Inspiron 9200 battery
Dell Latitude C500 battery
Dell HD438 Battery
Dell GK479 battery
Dell PC764 battery
Dell KD476 Battery
Dell Inspiron 1150 battery

07 June 2010

NAACL 2010 Retrospective

10 comments: