This is the first in what I hope will be a long line of "Area Chair Summaries" from NLP-related conferences. If you would like to contribute one, please contact me!
For ACL this year, I was lucky to be the Area Chair for the summarization track. I know I've said it before, but we really got a ton of great papers this year. In the end, seven were accepted for presentation (note there are also some summarization-related papers that officially fell under "Generation" for which I was not the area chair). I would like to say that there was some sort of unifying theme this year, but there's not one that I can actually come up with. The papers were:
P08-1035 [bib]: Tadashi Nomoto
A Generic Sentence Trimmer with CRFs
P08-1036 [bib]: Ivan Titov; Ryan McDonald
A Joint Model of Text and Aspect Ratings for Sentiment Summarization
P08-1092 [bib]: Fadi Biadsy; Julia Hirschberg; Elena Filatova
An Unsupervised Approach to Biography Production Using Wikipedia
P08-1093 [bib]: Qiaozhu Mei; ChengXiang Zhai
Generating Impact-Based Summaries for Scientific Literature
P08-1094 [bib]: Ani Nenkova; Annie Louis
Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization
P08-1054 [bib]: Gerald Penn; Xiaodan Zhu
A Critical Reassessment of Evaluation Baselines for Speech Summarization
P08-1041 [bib]: Giuseppe Carenini; Raymond T. Ng; Xiaodong Zhou
Summarizing Emails with Conversational Cohesion and Subjectivity
Most of these you can guess the contents more-or-less by their titles, but here's a quick run-down. Nomoto's is probably the hardest to guess. I dare-say it actually sounds a bit boring from just the title; it leads me to think it's yet another sentence compression method. But if you start reading the paper, you find out all about compression under dependency structures, summarization of Japanese text, and an fairly thorough evaluation.
The Titov and McDonald paper attempts to model associations between fine-grained user reviews of restaurants (eg: how do you rate the food versus the ambiance?) and the actual text in the reviews. This enables them to produce summaries that are specific to particular aspects of the review.
Biadsi, Hircshberg and Filatova present a model for producing biographies, by trying to identify biography-like sentences from Wikipedia (a source that's gaining more an more attention these days). One aspect that I found most interesting here was that they attempt to do full-on reference resolution and referring expression generation. This has always been something I've been too scared to touch, but they actually present some results that show that it's worthwhile!
Mei and Zhai talk about a sentence-retrieval method for summarizing scientific documents, which they gathered from DBLP. They take advantage of citation sentences (called "citances" by Marti Hearst and also explored deeply by Simone Teufel) and make a "citation" language model. This language model is interpolated with the standard document language model to perform extraction. This allows them to extract sentences that readers care about, not what the author thinks you should care about. The key limitation, of course, is that it only works once the paper has been cited for a while. (It would be nice to see how many citations you need before it's worth doing this.)
The paper by Nenkova and Louis describes a model for predicting if a batch of documents is going to be difficult (for a machine) to summarize. This is akin to the notion of query-difficulty that you see in IR. The results are about what you'd expect, but it's nice to see them played out. In particular, you see that more cohesive sets of documents are easier to summarize. Read the paper for more.
Penn and Zhu look at how well Rouge works when trying to summarize speech. They look at both Switchboard data (telephone conversations) and lectures. They have many interesting findings that cast doubt not only on the role that Rouge plays in speech summarization, but also on what sort of baselines are reasonable for speech, and what role meta-data plays in speech summarization. If you care at all about the intersection of speech and summarization, this truly is a must-read.
Last but not least, Carenini, Ng and Zhou discuss the task of summarizing emails (following up on previously work by the same authors on roughly the same task). Since the most relevant past work appeared in WWW07, I'll actually comment on that too (maybe unknown to many readers). There is quite a bit of work here, that primarily aims at finding useful features for summarizing chains of emails. They look at things like cue words, semantic similarity, lexical similarity, "PageRank" measures and a handful of others. This is all done in a graph-based framework that is pieced together based on processing the email chain (from what I can tell, this is highly non-trivial).
After writing this, I think that maybe the connecting thread is that there's a lot of work that aims at doing summarization for things other than straight news stories. I think this is a
fantastic step for the community. Keep up the good work, guys!