24 July 2010

ACL 2010 Retrospective

ACL 2010 finished up in Sweden a week ago or so. Overall, I enjoyed my time there (the local organization was great, though I think we got hit with unexpected heat, so those of us who didn't feel like booking a room at the Best Western -- hah! why would I have done that?! -- had no A/C and my room was about 28-30 every night).

But you don't come here to hear about sweltering nights, you come to hear about papers. My list is actually pretty short this time. I'm not quite sure why that happened. Perhaps NAACL sucked up a lot of the really good stuff, or I went to the wrong sessions, or something. (Though my experience was echoed by a number of people (n=5) I spoke to after the conference.) Anyway, here are the things I found interesting.

  • Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates, by Matthew Gerber and Joyce Chai (this was the Best Long Paper award recipient). This was by far my favorite paper of the conference. For all you students out there (mine included!), pay attention to this one. It was great because they looked at a fairly novel problem, in a fairly novel way, put clear effort into doing something (they annotated a bunch of data by hand), developed features that were significantly more interesting than the usual off-the-shelf set, and got impressive results on what is clearly a very hard problem. Congratulations to Matthew and Joyce -- this was a great paper, and the award is highly deserved.

  • Challenge Paper: The Human Language Project: Building a Universal Corpus of the World’s Languages, by Steven Abney and Steven Bird. Basically this would be awesome if they can pull it off -- a giant structured database with stuff from tons of languages. Even just having tokenization in tons of languages would be useful for me.

  • Extracting Social Networks from Literary Fiction, by David Elson, Nicholas Dames and Kathleen McKeown. (This was the IBM best student paper.) Basically they construct networks of characters from British fiction and try to analyze some literary theories in terms of those networks, and find that there might be holes in the existing theories. My biggest question, as someone who's not a literary theorist, is why did those theories exist in the first place? The analysis was over 80 or so books, surely literary theorists have read and pondered all of them.

  • Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish, by Reyyan Yeniterzi and Kemal Oflazer. You probably know that I think translating morphology and translating out of English are both interesting topics, so it's perhaps no big surprise that I liked this paper. The other thing I liked about this paper is that they presented things that worked, as well as things that might well have worked but didn't.

  • Learning Common Grammar from Multilingual Corpus, by Tomoharu Iwata, Daichi Mochihashi and Hiroshi Sawad. I wouldn't go so far as to say that I thought this was a great paper, but I would say there is the beginning of something interesting here. They basically learn a coupled PCFG in Jenny Finkel hierarchical-Bayes style, over multiple languages. The obvious weakness is that languages don't all have the same structure. If only there were an area of linguistics that studies how they differ.... (Along similar lines, see
    Phylogenetic Grammar Induction, by Taylor Berg-Kirkpatrick and Dan Klein, which has a similar approach/goal.)

  • Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation, by Michael Bloodgood and Chris Callison-Burch. The "trend" referenced in the title is that active learning always asymptotes depressingly early. They have turkers translate bits of sentences in context (i.e., in a whole sentence, translate the highlighted phrase) and get a large bang-for-the-buck. Right now they're looking primarily at out-of-vocabulary stuff, but there's a lot more to do here.
A few papers that I didn't see, but other people told me good things about:
At any rate, I guess that's a reasonably long list. There were definitely good things, but with a fairly heavy tail. If you have anything you'd like to add, feel free to comment. (As an experiment, I've turned comment moderation on as a way to try to stop the spam... I'm not sure I'll do it indefinitely; I hadn't turned it on before because I always thought/hoped that Google would just start doing spam detection and/or putting hard captcha's up or something to try to stop spam, but sadly they don't seem interested.)

6 comments:

D_K said...

Lauri Karttunen, with whom we had a chat yesterday, has told that the hottest topic in Uppsala was sentiment detection. Did you have a similar impression and have you seen anything worthwhile on that matter?

Due to this, the commettee might have rejected papers on other not-as-hot NLP topics, like machine translation.

hal said...

@D_K: I didn't see many (any?) sentiment papers, so it's hard to say... I wasn't actively avoiding them, but they just didn't pique my interest enough to drag me out of another session. I think there were fewer parsing and MT sessions than in the past, which I can unequivocally say that I think is a good thing. Diversity is good.

Kevin Duh said...

Nice list, Hal! Just to add to it, I also enjoyed the following papers:

a) Fine-Grained Tree-to-String Translation Rule Extraction (Xianchao Wu; Takuya Matsuzaki; Jun’ichi Tsujii) - statistical machine translation with HPSG.

b) Bootstrapping Semantic Analyzers from Non-Contradictory Texts (Ivan Titov; Mikhail Kozhevnikov) - a very interesting and challenging unsupervised semantic parsing problem.

c) Dynamic Programming for Linear-Time Incremental Parsing (Liang Huang; Kenji Sagae) - the title says it all.

d) Combining Data and Mathematical Models of Language Change (Morgan Sonderegger; Partha Niyogi) - models the evolution of stress change in English noun/verb pairs (e.g. "contract", "protest")

Shrey Agarwal said...

Hello,
I am a student from India pursuing my undergraduate studies in Computer Science and plan to do my MS from a foreign university in the field of NLP. Could you please tell me some of the universities which provide an MS program in this field?
Hoping to get some guidance from you on this matter.
Thanks! :)

hal said...

@kevin: thanks, those sound interesting and i dont' think i saw any of them!

@shrey: a good place to start might be here: http://aclweb.org/aclwiki/index.php?title=List_of_NLP/CL_courses ... of course, i have to say that obviously the best place to go is UMD :).

Shrey Agarwal said...

Thanks! :)
If it isn't a lot to ask, could you also brief me on what universities look for in an application? Is it just recommendations, projects and the statement of purpose, or something else too?