natural language processing blog: How to reduce reviewing overhead?

29 May 2009

How to reduce reviewing overhead?

It's past most reviewing time (for the year), so somehow conversations I have with folks I visit tend to gravitate toward the awfulness of reviewing. That is, there is too much to review and too much garbage among it (of course, garbage can be slightly subjective). Reviewing plays a very important role but is a very fallible system, as everyone knows, both in terms of precision and recall. Sometimes there even seems to be evidence of abuse.

But this post isn't about good reviewing and bad reviewing. This is about whether it's possible to cut down on the sheer volume of reviewing. The key aspect of cutting down reviewing, to me, is that in order to be effective, the reduction has to be significant. I'll demonstrate by discussing a few ideas that have come up, and some notes about why I think they would or wouldn't work:

Tiered reviewing (this was done at ICML this year). The model at ICML was that everyone was guaranteed two reviews, and only a third if your paper was "good enough." I applaud ICML for trying this, but as a reviewer I found it useless. First, it means that at most 1/3 of reviews are getting cut (assuming all papers are bad), but in practice it's probably more like 1/6 that get reduced. This means that if on average a reviewer would have gotten six papers to review, he will now get five. First, this is a very small decrease. Second, it comes with an additional swapping overhead: effectively I now have to review for ICML twice, which makes scheduling a much bigger pain. It's also harder for me to be self-consistent in my evaluations.
Reject without review (this was suggested to me at dinner last night: if you'd like to de-anonymize yourself, feel free in comments). Give area chairs the power that editors of journals have to reject papers out of hand. This gives area chairs much more power (I actually think this is a good thing: area chairs are too often too lazy in my experience, but that's another post), so perhaps there would be a cap on the number of reject without reviews. If this number is less that about 20%, then my reviewing load will drop in expectation from 5 to 4, which, again, is not a big deal for me.
Cap on submissions (again, a suggestion from dinner last night): authors may only submit one paper to any conference on which their name comes first. (Yes, I know, this doesn't work in theory land where authorship is alphabetical, but I'm trying to address our issues, not someone else's.) I've only twice in my life had two papers at a conference where my name came first, and maybe there was a third where I submitted two and one was rejected (I really can't remember). At NAACL this year, there are four such papers; at ACL there are two. If you assume these are equally distributed (which is probably a bad assumption, since the people who submit multiple first author papers at a conference probably submit stronger papers), then this is about 16 submissions to NAACL and 8 submissions to ACL. Again, which is maybe 1-4% of submitted papers: again, something that won't really affect me as a reviewer (this, even less than the above two).
Strong encouragement of short papers (my idea, finally, but with tweaks from others): right now I think short papers are underutilized, perhaps partially because they're seen (rightly or wrongly) as less significant than "full" papers. I don't think this need be the case. Short papers definitely take less time to review. A great "short paper tweak" that was suggested to me is to allow only 3 pages of text, but essentially arbitrarily many pages of tables/figures (probably not arbitrarily, but at least a few... plus, maybe just make data online). This would encourage experimental evaluation papers to be submitted as shorts (currently these papers typically just get rejected as being longs because they don't introduce really new ideas, and rejected as shorts because its hard to fit lots of experiments in four pages). Many long papers that appear in ACL could easily be short papers (I would guesstimate somewhere around 50%), especially ones that have the flavor of "I took method X and problem Y and solved Y with X (where both are known)" or "I took known system X, did new tweak Y and got better results." One way to start encouraging short papers is to just have an option that reviews can say something like "I will accept this paper as a short paper but not a long paper -- please rewrite" and then just have it accepted (with some area chair supervision) without another round of reviewing. The understanding would have to be that it would be poor form as an author to pull your paper out just because it got accepted short rather than accepted long, and so authors might be encouraged just to submit short versions. (This is something that would take a few years to have an effect, since it would be partially social.)
Multiple reviewer types (an idea that's been in the ether for a while). The idea would be that you have three reviewers for each paper, but each serves a specific role. For instance, one would exclusively check technical details. The other two could then ignore these. Or maybe one would be tasked with "does this problem/solution make sense." This would enable area chairs (yes, again, more work for area chairs) to assign reviewers to do things that they're good at. You'd still have to review as many papers, but you wouldn't have to do the same detailed level of review for all of them.
Require non-student authors on papers to review 3 times as many papers as they submit to any given conference, no exceptions ("three" because that's how many reviews they will get for any given paper). I know several faculty to follow the model of "if there is a deadline, we will submit." I don't know how widespread this is. The idea is that even half-baked ideas will get garner reviews that can help direct the research. I try to avoid offending people here, but that's what colleagues are for: please stop wasting my time as a reviewer by submitting papers like this. If they get rejected, you've wasted my time; if they get accepted, it's embarrassing for you (unless you take time by camera ready to make them good, which happens only some of the time). Equating "last author" = "senior person", there were two people at NAACL who have three papers and nine who have two. This means that these two people (who in expectation submitted 12 papers -- probably not true, probably more like 4 or 5) should have reviewed 12-15 papers. The nine should probably have reviewed 9-12 papers. I doubt they did. (I hope these two people know that I'm not trying to say they're evil in any way :P.) At ACL, there are four people with three papers (one is a dupe with a three from NAACL -- you know who you are!) and eight people with two. This would have the added benefit of having lots of reviews done by senior people (i.e., no crummy grad student reviews) with the potential downside that these people would gain more control over the community (which could be good or bad -- it's not a priori obvious that being able to do work that leads to many publications is highly correlated with being able to identify good work done by others).
Make the job of the reviewer more clear. Right now, most reviews I read have a schizophrenic feel. On the one hand, the reviewer justifies his rating to the area chair. On the other, he provides (what he sees as) useful feedback to the authors. I know that in my own reviewing, I have cut down on the latter. This is largely in reaction to the "submit anything and everything" model that some people have. I'll typically give (what I hope is) useful feedback to papers I rate highly, largely because I have questions whose answers I am curious about, but for lower ranked papers (1-3), I tend to say things like "You claim X but your experiments only demonstrate Y." Rather than "[that] + ... and in order to show Y you should do Z." Perhaps I would revert to my old ways if I had less to review, but this was a choice I made about a year ago.

I'd be interested to hear if others have additional "small changes" ideas. There are "large delta" ideas, such as Fernando's "everything is a journal" model, which I actually like, but is likely to be hard to implement because it's hard to make sweeping changes to a system (though VLDB -- or was it SIGMOD -- managed to do it).

I actually think that together, some of these ideas could have a significant impact. For instance, I would imagine 2 and 4 together would probably cut a 5-6 paper review down to a 3-4 paper review, and doing 6 on top of this would probably take the average person's review load down maybe one more. Overall, perhaps a 50% reduction in number of papers to review, unless you're one of the types who submits lots of papers. I'd personally like to see it done!

26 comments:

Kevin Duh said...: Hal, I'm glad you raised the subject. I enjoy reviewing tremendously, but recently I've been feeling overextended in my reviewing duties too--there are just many conferences and workshops in our area, and once that all add up, it's a significant amount of work.

Here are my thoughts about what you wrote:

1. Tiered review: I agree with you that tiered reviewing has a swapping overhead. Actually, I feel we have a similar overhead with our custom of long and short papers. Long paper deadline comes first and are reviewed first; short paper deadline comes later and may be reviewed again. Why can't we have the *same* deadline for long and short paper? I assume short paper is meant for "late-breaking" work, so the deadline is supposed to be closer to the conference date, but with so many conferences (NAACL, ACL, EACL, IJCNLP, EMNLP) with staggered dates, you can always find an appropriate deadline where your short paper is considered "late-breaking."

2. Reject without review. I think this is a nice idea, but rather than letting the area chairs decide, this should be an option for the reviewer. It would be difficult for the area chair to read all papers. For a reviewer, reading the paper is actually not very time-consuming; the majority of the effort is in writing thoughtful reviews and double-checking the details. Perhaps we should make it so that "rejection without review" is a standard option on the review form, and that you will only need to give a 1-2 sentence justification for the area chair and the author.

3. Cap on submission: I think this isn't good; it penalizes productive researchers.

4. Encourage short papers: I think this won't work and would argue against it. Don't get me wrong, I think short papers are great for publishing exploratory work and definitely they should be just as well-regarded as long papers. My opinion is that short papers are short simply because it's exploratory and there isn't that much material, so allowing 3 pages of text with arbitrary number of pages for graphics doesn't do much--if I had that much graphics, I might as well make it a long paper (who can tell the difference?)

5. Multiple reviewer type: this is interesting.. but it may be hard to define reviewer roles in a clear-cut fashion.

6. Requiring non-student authors to review 3 times as many papers as they submitted: Wow. I think you might get a revolt if you enforce this. :)

7. Make the reviewer job more clear: I think this is an IMPORTANT point! Yes, reviews serve two purposes: (1) to help the area chair make accept/reject decision; (2) to give useful feedback to authors. This goes back to my point about "reject without review" which is basically (1).; 30 May, 2009 01:20
Kevin Duh said...: A related idea: I think the "pains" of reviewing might be alleviated if there is some sort of "goodies" as reward. Currently, reviewing is all volunteer, so in the face of "real work" vs "volunteer work", it feels tough to spend so much time on reviewing out of a sense of duty. However, if there is some sort of reward, then psychologically it would make the effort worthwhile.

The award can be public recognition. For example, I was pretty happy to get a bottle of wine at ACL2006 for my reviews. Another example is NIPS, where the "best reviewers" (rated by authors) are recognized at the opening session. This won't relieve any work, but at least will make the worker feel good. This should be pretty easy to implement, too.; 30 May, 2009 01:33
Jurgen Van Gael said...: Very interesting suggestions Hal! May I add one more: I feel that the whole reviewing process should be incentivized. I feel cheated when I just spent a whole afternoon reviewing a paper, writing 2 pages of questions/suggestions and being honest about my confidence to see that someone just writes a two line review with a very high confidence. It just doesn't feel that that person didn't take his/her job seriously and get as much credit for it as I do. Hence my proposal: make a centralized database where all PC's upload review scores etc. It would be trivial to write a little program that correlates review scores with metrics like citations, "Best Long Standing Result Award"'s.

I can only see advantages: just as the quality of your research can be measured (in different ways), your service contribution can be evaluated. And let's be honest, setting up such a database for a cutting edge community as ourselves shouldn't be that big of a deal ...; 30 May, 2009 02:30
hal said...: Kevin --

1. The reason is two-fold: one is the "late breaking" thing (which only sort of works, because if you have a really late breaking great idea, you might just hold off for a long paper next year otherwise you get hit with double jeopardy) and, for instance at NAACL, that we want to have a deadline after the ACL rejects.

2. I think lots of reviewers effectively do do RwR :).

3. Agreed.

4. I dunno... I'm not totally convinced. I remember seeing several papers in the past few years that basically look like "we took system X and added Y as a preprocessor" and half the paper was dedicated to explaining system X, which was the author's previous work in the previous year's conference. There's no reason for this.

5. Agreed. I think it's clearer in theory land where you can have one reviewer whose sole responsibility is correctness.

6. Really? I actually thought this was the least controversial bit! (There are obvious corner cases: I have a submission to a system's conference with a student and a system's faculty: you probably wouldn't want me reviewing there, so the system's faculty would have to take my workload, unless they happen to get some ML papers.)

7. This would also be a way to split reviewers in (5).

Since this started out with a discussion of "pains" of reviewing, I think it's worth thinking why this is painful. It's painful not because reading papers is painful. It's painful because we have to read papers we don't want to read (either they're bad or they're just not in our area). Out of area papers take a long time to review because I have to learn about a new area, and usually a new vocabulary.

Incentives:I like small incentives, like reviewer awards and so on (and I'm especially fond of wine!), but at the end of the day, I still feel like we're a community, and should be willing to do service. There are a few things that I think we really should do here: area chairs rate reviewers and good reviewers are publicly endorsed (maybe with wine); reviewer feedback from area chairs -- when I A.C, I like to send email to all my reviewers at the end of the reviewing period telling them which papers made it in so that their work doesn't disappear into a black hole (I've heard from reviewers that they really like this, and I'd really like it, but I've never had it done to me: though I've had some other people tell me they'd just consider such an email to be spam). I also think some sort of institutional memory is a good idea here.

The problem I have with matching reviewers to papers and looking at citation rates over time is that paper acceptances are by definition asymmetric: a rejected paper cannot receive citations. This could lead to an even more conservative system, which I think is very bad. Plus, "impact of papers" is something that the bean counters in my dean's office have been trying to assess forever and they still can't do it, so it just shifts responsibility to an unsolved problem.; 30 May, 2009 05:16
Ford Truck Parts said...: This comment has been removed by a blog administrator.; 30 May, 2009 08:29
Trevor Strohman said...: #4: Yes, please!

Other disciplines have very high impact publication modes for short papers. I think if CS had a publication venue for short papers with very stringent review requirements, we'd see an increase in the quality at this length.

Let's be realistic; most readers of this 8 page paper will skim it and try to extract the important points. Most people only read two of the eight pages anyway, so why not just publish those?

As an example of a good short paper, see:
www.tam.cornell.edu/tam/cms/manage/upload/SS_nature_smallworld.pdf; 30 May, 2009 10:38
Ryan McDonald said...: I think Kevin has a point that long and short papers should have the same deadline. I think this is part of the reason why these past couple of years the review load seems to have increased substantially. It would be interesting to see stats on how many papers were submitted twice, once as long and then again as a short paper. We should give area and program chairs the discretion to accept long papers as a short paper if they believe it does not need to be 8 pages long. This way it does not have to be re-submitted and re-reviewed by 3 people. I once submitted a 12 page paper to ECIR and it got accepted as a 6 page short paper. In hindsight, that was probably the right decision.

I like the idea of putting a cap on the text length of short papers, but a much looser restriction on figures and especially tables, i.e., you either get 4 pages total or 2-3 pages of text plus another 3-4 for tables. Actually, I think I suggested this to Hal, so I am now officially de-anonymized. Many times I have seen a long paper get rejected since there was no novel methodology, even though it was a nice experimental study. Then the paper again gets rejected as a short-paper because 4 pages is just not long enough for a comprehensive empirical study that needs space for tables, graphs, etc. In general I think our community undervalues experimental studies relative to incremental improvements in methodology.

Continuing one of Jurgen's thoughts: what about having conferences share more information about papers? If a paper has been rejected before, the area chair should know, be able to look at the reviews, and determine if the authors have addressed any of the problems. If not, reject without review. Perhaps we could even automate part of this process by checking whether the authors made any changes at all. This needs to be handled with care, so that area chairs don't see reviews of rejected papers they were authors on and old reviews should be anonymized. I guess this would not be needed if we do something like Fernando suggests and have rolling submissions to CL, where all accepted papers are presented at conferences. This way a paper is only reviewed once, unless the authors make substantial changes and re-submit. Btw, I like this idea. I think it will improve CL, force us to run more comprehensive experiments (since we have the space and the editors can ask and *enforce* improvements), and make it easier for tenure track faculty who have to explain why they only have conference papers.

Finally, even though I suggested it to Hal, I have to agree that a cap on submissions per author wont work, but for the reason Hal gives, not because it would hurt productive researchers, remember we are talking only about first author papers. In any given year we have roughly 4-6 conferences (ACL, NAACL, EACL, CoNLL, EMNLP, COLING, RANLP, ...) plus a number of conferences in related areas (ML, IR, DM, AI, ...). So unless you are publishing 10+ first author papers a year (which means you roughly spend 1 month on doing research per paper), this should not be a problem. Though, I think there is an argument to be made that this unfairly favors productive researchers who have the money to travel to 10 different conferences a year or that it hurts people aiming for tenure. But the underlying sentiment in suggesting caps is that we want researchers to put their best work forward.; 30 May, 2009 12:19
Jason said...: Hal, I like your idea of limiting short papers to three pages of text and providing more space to figures. However, I'd extend this to providing an extra half or full page dedicated to references. Many (most?) ACL and NAACL short papers only cite 4 or 5 papers.

The small number of citations understandable given the space limitations, but tends to cause problems. I think this results in the temptation self-cite or cite marginally related papers by probable reviewers at the expense of truly related work.; 30 May, 2009 13:45
Jenny Finkel said...: I guess I'm alone here, but I like that the short and long paper deadlines are separate, as both a paper writer and a reviewer. I don't think that making them be the same day will that substantially reduce the number of submissions (maybe I'm wrong here?), but having them separate breaks up the reviewing, so that I have fewer reviews due at a time.

Another argument against submission caps (and i guess also pro separate submission dates for short/long papers) is that it gives you more chances to get a paper into a conference. Obviously you should only submit papers that you think are actually good, but sometimes you have more than one. And, for most grad students, your advisor will only fund your conference travel if you have a paper in the conference, but being a grad student you (or at least I) pretty much want to go to all of them. It would suck if you had two potential submissions, had to pick one, and then that one got rejected when maybe the other one wouldn't have.; 30 May, 2009 13:56
Fernando Pereira said...: "Everything is a journal" is now in process for the VLDB community with the Proceedings of the VLDB Endowment. Something like this would do way more for the ACL-sponsored conferences than more incremental tweaks, IMHO. Much reviewing overhead comes from peak load and from resubmissions to conference after conference without a persistent history trail. A journal-based model would solve both problems. ACL should do this, maybe through a new short communications journal.; 31 May, 2009 20:03
Bob Carpenter said...: Until we remove the incentive to junior faculty and grad students for lots of publications in low-acceptance rate venues, not much is going to stem the tide of submissions.

My own preference is for accepting just about everything at conferences and pushing more of the reviewing burden to journals. That's partly because I know reviewing has low precision and recall and partly because I like papers that are more interdisciplinary and speculative than most reviewers.

These days, I simply decline to do reviews for (a) high rejection rate conferences, and (b) closed source journals or conferences [of which there are still a surprising number in CS].; 01 June, 2009 15:50
John Langford said...: I'm generally with Fernando---we should go for big improvements. Some ideas that I'm toying with (but which may need tweaking).

(a) Requiring submission by Arxiv. This will make some people feel squeamish, because you can't retract a paper on Arxiv. However, that's a good thing: If an author isn't ready to have a paper permanently associated with them, then it's not ready to submit for review. I'm not sure how many papers are submitted in terrible form, but I expect this to help remove the worst.

(b) Open reviewing. Reviews for papers are not lost in resubmission at the next conference, although reviewers may remain anonymous. I say this despite suffering from some rather abusive reviews, which I'm sure make people cautious in supporting this idea. There are several good effects which can come about here: (1) Abusive reviewers will simply look like abusive reviewers if their reviews are public. In general, the quality of reviews will surely improve when they are public. Low quality reviews contribute to the problem in two ways: they cause good papers to be rejected (and resubmitted) and encourage bad papers to be submitted.
(2) the process of reviewing substantially does not need to be redone. I'm not sure what the average number of submissions before an accept is, but I wouldn't be surprised to learn that it's 2 or more, implying perhaps a factor of 2 savings.

(c) Reviewing in context. Have the authors declare who the paper should be of interest to. This is similar to the experiment that ICML did this year, but it could be more general. Have reviewers drawn from the group of interest. If the authors declare only a small group of interest, then consider that a reason to decline. This both makes reviewing easier, because reviewers only see papers of interest, and discriminates against papers of narrow interest reducing the number that need to be reviewed.

Some of the above is related to ideas that Yann LeCun and Yoshua Bengio suggested at AIStat.; 02 June, 2009 08:23
Daniel Roy said...: I like John's suggestions. Arxiv, or a similar permanent history of the paper, is a sure fire way to cut down on noise. However, I would like to add a "scaling law" that I think we should strive to achieve.

A reviewing system should be set up such that: if a paper is of interest to no one, it is not reviewed.

For example, say Mr. X posts a paper to Arxiv, talks about it on his blog, and N people read it. If some of these people care about the content, they will eventually uncover bugs in trying to understand it (also recognize that the readers don't assume it's correct necessarily). As bugs are found, they tell Mr. X who posts updates, possibly with attribution (addressing van Gael's idea). If there are interesting bugs or if Mr. X refuses to address concerns and there is a disagreement, these readers post other Arxiv papers citing the paper and some healthy discussion ensues. In the end, a paper which has generated a lot of interest and survived in the wild is ready to be published in a journal.; 08 June, 2009 23:38
Anonymous said...: Very nice information. Thanks for this. Please come visit my site Fortworth City Directory when you got time.; 24 October, 2009 01:40
Anonymous said...: Very nice information. Thanks for this. Please come visit my site Phone Directory Of Fortworth City Texas TX State when you got time.; 24 October, 2009 01:41
Anonymous said...: I enjoyed reading your work! GREAT post! I looked around for this… but I found you! Anyway, would you mind if I threw up a backlink from my site? Please come visit my site Business Reviews Of Columbus City when you got time.; 30 October, 2009 22:34
Anonymous said...: I enjoyed reading your work! GREAT post! I looked around for this… but I found you! Anyway, would you mind if I threw up a backlink from my site? Please come visit my site Columbus Business Search when you got time.; 30 October, 2009 22:35
Anonymous said...: Perhaps we should make it so that "rejection without review" is a standard option on the review form, and that you will only need to give a 1-2 sentence justification for the area chair and the author.
Assignment Service | GCSE Coursework | Custom Thesis; 31 October, 2009 02:56
Anonymous said...: This both makes reviewing easier, because reviewers only see papers of interest, and discriminates against papers of narrow interest reducing the number that need to be reviewed.
Custom Dissertation | Custom Essay; 31 October, 2009 02:56
Anonymous said...: Hello,
I really appreciate it, and this resource is really useful for us. Thanks for sharing.

Dissertation Online; 11 December, 2009 04:53
Unknown said...: Hi,
Nice post! You have worked hard on jotting down the essential information. Keep sharing the good work in future too.

UK Essays; 14 December, 2009 04:36
Anonymous said...: Hi,
It was a very nice article! Just want to say thank you for the information you have shared. Just continue writing this kind of post. Thanks.

Coursework help; 14 December, 2009 06:15
Unknown said...: Hi,
I truly enjoyed reading it. Waiting for some more great articles like this from you in the coming days
Essay paper; 16 December, 2009 04:09
Buy Term Paper said...: Hi,
It must've taken you a bit of time, so thanks for taking the time to do so, I appreciate it, and this post is just great.; 15 January, 2010 00:57
gamefan12 said...: Reviewing plays a very important role but is a very fallible system, as everyone knows, both in terms of precision and recall. Sometimes there even seems to be evidence of abuse is so important.
boca raton cosmetic dentist; 15 April, 2010 09:50
combattery84 said...: IBM ThinkPad R60 Battery
IBM ThinkPad T60 Battery
IBM ThinkPad T41 Battery
IBM ThinkPad T43 Battery
IBM ThinkPad X40 Battery
Thinkpad x24 battery
ThinkPad G41 battery
IBM thinkpad r52 battery
Thinkpad x22 battery
IBM thinkpad t42 battery
IBM thinkpad r51 battery
Thinkpad r50 battery
IBM thinkpad r32 battery
Thinkpad x41 battery
SONY VGP-BPS2 Battery
SONY VGP-BPS2C Battery
SONY VGP-BPS5 battery
SONY VGP-BPL2C battery
SONY VGP-BPS2A battery
SONY VGP-BPS2B battery
SONY PCGA-BP1N battery
SONY PCGA-BP2E battery
SONY PCGA-BP2NX battery
SONY PCGA-BP2S battery
SONY PCGA-BP2SA battery
SONY PCGA-BP2T battery
SONY PCGA-BP2V battery
SONY PCGA-BP4V battery
SONY PCGA-BP71 battery
SONY PCGA-BP71A battery
SONY VGP-BPL1 battery
SONY VGP-BPL2 battery; 07 July, 2010 00:27

natural language processing blog

29 May 2009

How to reduce reviewing overhead?

26 comments:

About Me

Labels

My Blog List

Blog Archive