23 January 2006

Most Influential NLP Papers

I conducted a mini survey recently, asking people I knew what they thought were the most influential papers in NLP from the past two decades. Here are the wholly unscientific results, sorted from most votes and subsorted by author. Note that I only got responses from 7 people. I've not listed papers that got only one vote and have not included my personal votes.

(7 votes): Brown et al., 1993; The Mathematics of Statistical Machine Translation
(5 votes): Collins, 1997; Three Generative, Lexicalised Models for Statistical Parsing
(4 votes): Marcus, 1993 Building a large annotated corpus of English: the Penn Treebank
(3 votes): Berger et al., 1996; A maximum entropy approach to natural language processing
(2 votes): Bikel et al., 1997; An Algorithm that Learns What's in a Name
(2 votes): Collins, 2002; Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
(2 votes): Lafferty et al., 2001; Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
(2 votes): Och, 2003; Minimum Error Rate Training for Statistical Machine Translation
(2 votes): Papineni et al., 2001; Bleu: a method for automatic evaluation of machine translation
(2 votes): Ratnaparkhi, 1999; Learning to Parse Natural Language with Maximum Entropy Models
(2 votes): Yarowsky, 1995; Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

This seems to be dominated by parsing and techology/machine learning papers, with a smattering of MT thrown in for good measure. I feel that the former represent a sort of "lowest common demoninator." There were several singleton votes for various papers in summarization, language modeling, IR, and other fields, but, with the exception of the MT papers, getting >1 vote meant that you had to be useful to more than just your one community. It's also a bit surprising that Collins' parser is so high, but no other parsers got any votes (Charniak, Lin, etc.).

Feel free to say what you think is missing!

8 comments:

Jonathan said...

it's interesting - I think most of the authors and research areas I selected are represented in this list, however only one of the specific papers I mentioned (the yarowsky) is exactly cited. For example, I picked some other collins parsing paper, a different treebank paper, etc. Hal, is it correct to assume that you counted all these mentions as votes for the same work and then chose a representative paper? I think this is an appropriate thing to do, but it also indicates that people feel a contribution to the field comes not from a specific paper but from a specific research effort. Which seems valid. I wonder, to what degree did people agree on a particular paper? Was I an outlier, constantly picking the "wrong" title compared to others, or was there a more or less evenly distributed choice of particular titles? This would of course have much more meaning with more data points. Still, interesting to see. Also, how diverse were the backgrounds of those surveyed (as regards their NLP background)?

hal said...

yes, when multiple "similar" papers were selected, i attempted to merge. i listed as the "chosen" paper the one that i thought best represented this set :).

and yes, there's a strong bias. all the responses but one came from isi or penn, and we know these are well known for mt and parsing, respectively.

hal said...

i should mention that a lot of people cited nonexistent papers. ie., they said "Smith, 1999 on topic X" when there was no paper on topic X by Smith from 1999, but rather from 1998. so it's difficult to say what "consensus" would be :).

Anonymous said...

It is very insteresting. Maybe hal can make a vote link for all NLPers to list their own opions. :)

hal said...

i'd love to do this, but i tried and failed. blogger won't let you put javascript in the posts, which means no standardly hosted polls. also, most poll sites won't let people vote for, say, max 5 papers. you could only vote for one. i think this would lead to skewed results.

here's a solution: post in a comment your top five papers. i'll assemble the results and republish. feel free to be as brief as you want, so long as i can figure out what you mean. "brown 93" or "collins parsing model" are sufficiently specific. "that papers that learned how to use unlabeled data from the late 90s" is not.

Anonymous said...

I think Fernando Perira et al's paper "Distributional clustering of english words" was also very influential and is often cited.

Pradeep said...

This is a nice blog with very good information on NLP.I'll continue visiting it for more information.

Thanks,
Pradeep Aggarwal

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花