27 April 2007

EMNLP papers, tales from the trenches

First, EMNLP/CoNLL papers have been posted. I must congratulate the chairs for publishing this so fast -- for too many conferences we have to wait indefinitely for the accepted papers to get online. Playing the now-standard game of looking at top terms, we see:

  1. model (24)
  2. translat (17)
  3. base (16)
  4. learn (13) -- woohoo!
  5. machin (12) -- mostly as in "machin translat"
  6. word (11) -- mostly from WSD
  7. structur (10) -- yay, as in "structured"
  8. disambigu (10) -- take a wild guess
  9. improv (9) -- boring
  10. statist, semant, pars, languag, depend, approach (all 7)
EMNLP/CoNLL this year was the first time I served as area chair for a conference. Overall it was a very interesting and enlightening experience. It has drastically changed my view of the conference process. I must say that overall it was a very good experience: Jason Eisner did a fantastic job at keeping things on track.

I want to mention a few things that I noticed about the process:
  1. The fact that reviewers only had a few weeks, rather than a few months to review didn't seem to matter. Very few people I asked to review declined. (Thanks to all of you who accepted!) It seems that, in general, people are friendly and happy to help out with the process. My sense is that 90% of the reviews get done in the last few days anyway, so having 3 weeks or 10 weeks is irrelevant.

  2. The assignment process of papers to reviewers is hard, especially when I don't personally know many of the reviewers (this was necessary because my area was broader than me). Bidding helps a lot here, but the process is not perfect. (People bid differently: some "want" to review only 3 papers, so "want" half...) If done poorly, this can lead to uninformative reviews.

  3. Papers were ranked 1-5, with half-points allowed if necessary. None of my papers got a 5. One got a 4.5 from one reviewer. Most got between 2.5 and 3.5, which is highly uninformative. I reviewed for AAAI this year, where you had to give 1,2,3 or 4, which forces you to not abstain. This essentially shifts responsibility from the area chair to the reviewer. I'm not sure which approach is better.

  4. EMNLP asked for many criteria to be evaluated by reviewers; more than in conferences past. I thought this was very useful to help me make my decisions. (Essentially, in addition to high overall recommendation, I looked for high scores on "depth" and "impact.") So if you think (like I used to) that these other scores are ignored: be assured, they are not (unless other area chairs behave differently).

  5. Blindness seems like a good thing. I've been very back and forth on this until now. This was the first time I got to see author names with papers and I have to say that it is really hard to not be subconsciously biased by this information. This is a debate that will never end, but for the time being, I'm happy with blind papers.

  6. 20-25% acceptance rate is more than reasonable (for my area -- I don't know about others). I got 33 papers, of which three basically stood out as "must accepts." There were then a handful over the bar, some of which got in, some of which didn't. There is certainly some degree of randomness here (I believe largely due to the assignment of papers to reviewers), and if that randomness hurt your paper, I truly apologize. Not to belittle the vast majority of papers in my area, but I honestly don't think that the world would be a significantly worse place is only those top three papers had gotten in. This would make for a harsh 9% acceptance rate, but I don't have a problem with this.

    I know that this comment will probably not make me many friends, but probably about half of the papers in my area were clear rejects. It seems like some sort of cascaded approach to reviewing might be worth consideration. The goal wouldn't be to reduce the workload for reviewers, but to have them concentrate their time on papers that stand a chance. (Admittedly, some reviewers do this anyway internally, but it would probably be good to make it official.)

  7. Discussion among the reviewers was very useful. I want to thank those reviewers who added extra comments at the end to help me make the overall decisions (which then got passed up to the "higher powers" to be interpolated with other area chair's decisions). There were a few cases where I had to recruit extra reviewers, either to get an additional opinion or because one of my original reviewers went AWOL (thanks, all!). I'm happy to say that overall, almost all reviews were in on time, and without significant harassment.
So that was my experience. Am I glad I did it? Yes. Would I do it again? Absolutely.

Thanks again to all my reviewers, all the authors, and especially Jason for doing a great job organizing.

12 comments:

Anonymous said...

Thansk for the post, but what can you say about the effect the sample selection has on each reviewer's score?

I mean obviously everyone is inclined (consiously or not) to have a quasi-normal distribution over his scores -- a very strong paper usually raises the reviwer's standars when looking at the next one; you just can't help! of course there are clear accept or reject papers, and I beleive what I am addressing her usually pertains to the in-between categories (you metioned something related to this in ur post).

Have you seen this effect on ur area, perhaps by looking at how the reviwers of those top three papers assign scores to other papers?? is it a problem and how it can be solved?

it looks to me like a chicken and egg problem as you need to have a rough idea about the paper before assiging it and it supports your cascading approach. Do you think knowing the authors of the paper would help in this cascading approach (at least at the area chair scale in assigning papers) or it will just add to the randomness and increase the bias?

Anonymous said...

in other words, whose rule its to smooth this variance, and how did you handle this yourself. I just feel it is overwhelming to go and read every single on the border paper -- and the problem get exacerbated as you move to the top of the reviewing heirarchy .

hal said...

There wasn't much variance in what I pushed forward as the top three papers. But that's somewhat of a chicken-and-egg statement. The may not have been the top three had they had variance :). For good looking paper that did have high variance, I asked authors to discuss. Usually they did not change their scores. (Sometimes by 0.5 points.) But the discussion was quite helpful in understanding why there was high variance. Typically it was due to the fact that different reviewers actually *want* different things out of a paper.

I think it's important to separate out variance due to noise and variance due to reviewer preferences. I would venture that for the majority of the pretty good papers in the track (maybe 10ish), the primary variance that existed was due to reviewer preferences.

Of course, reviewer assignments introduce variance, too, due primarily to the fact that different reviewer biases can show up as variance. If you assign a theoretically strong by empirically weak paper to three empirically-minded reviewers, the variance will be artificially low.

Anonymous said...

Aha, I like the last paragraph of you reply. looks like we must accept to live with some randomness in the proccess.

Here is a suggestion.

I always wondered why do a reviwer need to enter a final evalulation score if he/she already entered the breakdown of her score along different criteria. Why don't we just have a given theme of the conference, like what you mentined before about ACL, this year we would like to stress innovative ideas as oppossed to stressing on empircal rigor! The chair puts the weights on each criteria, reviwers enter detailed scores, and that is it, final recommendation is there as a weighted average.

I think this will enforce more overall consistency and somehow would downweights the reviewer preference bias.

agree/disagree??

Anonymous said...

More on the topic. I really don't know why we always throw away these valuable data!

I mean why we don't make public these detailed/overall score data (after some anonymization for sure). We can do cool stuff with it like:

- try to fit an overall regressor and understand how ppl dervie the final recommendation form the detailed one.
- fit a mixture of regressors and see if we can end up with meaningful semantic classes (empirically-minded ppl vs. theoritically minded ones).

and the opportunities are endless here. Of course you don't need to disclose the papers themselves although some sort of summaries (like word counts) would enable more cool stuff (topic models??).

at the least, automatic anlysis that provides enough summary can help the chair(s) focus their effort in the right direction and help them better smooth the overall individual decisions!

hal said...

FYI, Fernando Pereira has something to say.

I've also wondered why they don't do what you suggest. I think that people psychologically don't like mechanized reviewing, but that's not a good excuse. I think that what ends up happening in practice is that the overall-recs are used to filter the top, then the ACs read/skim those top papers, and perhaps order them by some combination of weights that they deem are appropriate. Why the first step couldn't happen automatically, I don't know. It would be an interesting multilevel regression
problem to see how people actually behave wrt giving overall scores :).

So this just shifting the bias to the area chair, so you'd better hope the area chairs have good biases, or that the higher-ups can instill a good bias. Importantly, though, the ACs get a much more global picture than any single reviewer, so the bias effect is probably somewhat mitigated.

hal said...

Yay simultaneous posts.

I have many thoughts along the lines of making review information public, but I think that will have to wait for another post. I think the short answer is that people fear "repercussions" from various forms of non-anonymity (nymity?)

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex