23 January 2006

Most Influential NLP Papers

I conducted a mini survey recently, asking people I knew what they thought were the most influential papers in NLP from the past two decades. Here are the wholly unscientific results, sorted from most votes and subsorted by author. Note that I only got responses from 7 people. I've not listed papers that got only one vote and have not included my personal votes.

(7 votes): Brown et al., 1993; The Mathematics of Statistical Machine Translation
(5 votes): Collins, 1997; Three Generative, Lexicalised Models for Statistical Parsing
(4 votes): Marcus, 1993 Building a large annotated corpus of English: the Penn Treebank
(3 votes): Berger et al., 1996; A maximum entropy approach to natural language processing
(2 votes): Bikel et al., 1997; An Algorithm that Learns What's in a Name
(2 votes): Collins, 2002; Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
(2 votes): Lafferty et al., 2001; Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
(2 votes): Och, 2003; Minimum Error Rate Training for Statistical Machine Translation
(2 votes): Papineni et al., 2001; Bleu: a method for automatic evaluation of machine translation
(2 votes): Ratnaparkhi, 1999; Learning to Parse Natural Language with Maximum Entropy Models
(2 votes): Yarowsky, 1995; Unsupervised Word Sense Disambiguation Rivaling Supervised Methods

This seems to be dominated by parsing and techology/machine learning papers, with a smattering of MT thrown in for good measure. I feel that the former represent a sort of "lowest common demoninator." There were several singleton votes for various papers in summarization, language modeling, IR, and other fields, but, with the exception of the MT papers, getting >1 vote meant that you had to be useful to more than just your one community. It's also a bit surprising that Collins' parser is so high, but no other parsers got any votes (Charniak, Lin, etc.).

Feel free to say what you think is missing!

14 comments:

Jonathan said...

it's interesting - I think most of the authors and research areas I selected are represented in this list, however only one of the specific papers I mentioned (the yarowsky) is exactly cited. For example, I picked some other collins parsing paper, a different treebank paper, etc. Hal, is it correct to assume that you counted all these mentions as votes for the same work and then chose a representative paper? I think this is an appropriate thing to do, but it also indicates that people feel a contribution to the field comes not from a specific paper but from a specific research effort. Which seems valid. I wonder, to what degree did people agree on a particular paper? Was I an outlier, constantly picking the "wrong" title compared to others, or was there a more or less evenly distributed choice of particular titles? This would of course have much more meaning with more data points. Still, interesting to see. Also, how diverse were the backgrounds of those surveyed (as regards their NLP background)?

hal said...

yes, when multiple "similar" papers were selected, i attempted to merge. i listed as the "chosen" paper the one that i thought best represented this set :).

and yes, there's a strong bias. all the responses but one came from isi or penn, and we know these are well known for mt and parsing, respectively.

hal said...

i should mention that a lot of people cited nonexistent papers. ie., they said "Smith, 1999 on topic X" when there was no paper on topic X by Smith from 1999, but rather from 1998. so it's difficult to say what "consensus" would be :).

Anonymous said...

It is very insteresting. Maybe hal can make a vote link for all NLPers to list their own opions. :)

hal said...

i'd love to do this, but i tried and failed. blogger won't let you put javascript in the posts, which means no standardly hosted polls. also, most poll sites won't let people vote for, say, max 5 papers. you could only vote for one. i think this would lead to skewed results.

here's a solution: post in a comment your top five papers. i'll assemble the results and republish. feel free to be as brief as you want, so long as i can figure out what you mean. "brown 93" or "collins parsing model" are sufficiently specific. "that papers that learned how to use unlabeled data from the late 90s" is not.

Anonymous said...

I think Fernando Perira et al's paper "Distributional clustering of english words" was also very influential and is often cited.

Pradeep said...

This is a nice blog with very good information on NLP.I'll continue visiting it for more information.

Thanks,
Pradeep Aggarwal

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

tar said...

Мультфильмы онлайн

qishaya said...

one day i went shopping outside,and in an ed hardy store,I found some kinds of ed hardy i love most they are Your website is really good Thank you for the information ed hardy ed hardy ed hardy clothing ed hardy clothing ed hardy shoes ed hardy shoes don ed hardy don ed hardy ed hardy clothes ed hardy clothes ed hardy bags ed hardy bags ed hardy swimwear ed hardy swimwear ed hardy jeans ed hardy jeans ed hardy mens ed hardy mens Thank you for the information

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

cilemsin42 said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chat
sesli sohbet siteleri

sesli chat siteleri sesli sohbetsesli chat
sesli sohbet siteleri
sesli chat siteleri
SesliChat
cılgın sohbet
güzel kızlar
bekar kızlar
dul bayanlar
seviyeli insanlar
yarışma
canlı müzik
izdivac
en güzel evlilik
hersey burada
sesliparti
seslisohbet odalari
Sesli adresi
Sesli Chat
SesliChat Siteleri
Sesli Chat sitesi
SesliChat sitesi
SesliSohbet
Sesli Sohbet
Sesli Sohbet Sitesi
SesliSohbet Sitesi
SesliSohbet Siteleri
Muhabbet Sitesi
kamerali chat
Görüntülü Sohbet
Hasret gülleri
Çet sitesi
SesliSohbet
Sesli Sohbet
Canli sohbet
Turkce sohbet
Kurtce Sohbet
Kurtce Chat
Kurtce Muhabbet
Kurtce Sohbet
Kurdish Chat
SesliChat
Sesli Chat
SesliSanal
Guncel Haber
sohbet Sitesi
Chat sitesi..

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex