23 January 2006

Most Influential NLP Papers

I conducted a mini survey recently, asking people I knew what they thought were the most influential papers in NLP from the past two decades. Here are the wholly unscientific results, sorted from most votes and subsorted by author. Note that I only got responses from 7 people. I've not listed papers that got only one vote and have not included my personal votes.

(7 votes): Brown et al., 1993; The Mathematics of Statistical Machine Translation
(5 votes): Collins, 1997; Three Generative, Lexicalised Models for Statistical Parsing
(4 votes): Marcus, 1993 Building a large annotated corpus of English: the Penn Treebank
(3 votes): Berger et al., 1996; A maximum entropy approach to natural language processing
(2 votes): Bikel et al., 1997; An Algorithm that Learns What's in a Name
(2 votes): Collins, 2002; Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
(2 votes): Lafferty et al., 2001; Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
(2 votes): Och, 2003; Minimum Error Rate Training for Statistical Machine Translation
(2 votes): Papineni et al., 2001; Bleu: a method for automatic evaluation of machine translation
(2 votes): Ratnaparkhi, 1999; Learning to Parse Natural Language with Maximum Entropy Models
(2 votes): Yarowsky, 1995; Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

This seems to be dominated by parsing and techology/machine learning papers, with a smattering of MT thrown in for good measure. I feel that the former represent a sort of "lowest common demoninator." There were several singleton votes for various papers in summarization, language modeling, IR, and other fields, but, with the exception of the MT papers, getting >1 vote meant that you had to be useful to more than just your one community. It's also a bit surprising that Collins' parser is so high, but no other parsers got any votes (Charniak, Lin, etc.).

Feel free to say what you think is missing!

6 comments:

hal said...

yes, when multiple "similar" papers were selected, i attempted to merge. i listed as the "chosen" paper the one that i thought best represented this set :).

and yes, there's a strong bias. all the responses but one came from isi or penn, and we know these are well known for mt and parsing, respectively.

hal said...

i should mention that a lot of people cited nonexistent papers. ie., they said "Smith, 1999 on topic X" when there was no paper on topic X by Smith from 1999, but rather from 1998. so it's difficult to say what "consensus" would be :).

Anonymous said...

It is very insteresting. Maybe hal can make a vote link for all NLPers to list their own opions. :)

hal said...

i'd love to do this, but i tried and failed. blogger won't let you put javascript in the posts, which means no standardly hosted polls. also, most poll sites won't let people vote for, say, max 5 papers. you could only vote for one. i think this would lead to skewed results.

here's a solution: post in a comment your top five papers. i'll assemble the results and republish. feel free to be as brief as you want, so long as i can figure out what you mean. "brown 93" or "collins parsing model" are sufficiently specific. "that papers that learned how to use unlabeled data from the late 90s" is not.

Anonymous said...

I think Fernando Perira et al's paper "Distributional clustering of english words" was also very influential and is often cited.

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花