09 November 2010

Managing group papers

Every time a major conference deadline (ACL, NIPS, EMNLP, ICML, etc...) comes around, we usually have a slew of papers (>=3, typically) that are getting prepared.  I would say on average 1 doesn't make it, but that's par for the course.

For AI-Stats, whose deadline just passed, I circulated student paper drafts to all of my folks to solicit comments at any level that they desired.  Anywhere from not understanding the problem/motivation to typos or errors in equations.  My experience was that it was useful, both from the perspective of distributing some of my workload and getting an alternative perspective, to keeping everyone abreast of what everyone else is working on.

In fact, it was so successful that two students suggested to me that I require more-or-less complete drafts of papers at least one week in advance so that this can take place.  How you require something like this is another issue, but the suggestion they came up with was that I'll only cover conference travel if this occurs.  It's actually not a bad idea, but I don't know if I'm enough of a hard-ass (or perceived as enough of a hard-ass) to really pull it off.  Maybe I'll try it though.

The bigger question is how to manage such a thing.  I was thinking of installing some conference management software locally (eg., HotCRP, which I really like) and giving students "reviewer" access.  Then, they could upload their drafts, perhaps with an email circulated when a new draft is available, and other students (and me!) could "review" them.  (Again, perhaps with an email circulated -- I'm a big fan of "push" technology: I don't have time to "pull" anymore!)

The only concern I have is that it would be really nice to be able to track updates, or to have the ability for authors to "check off" things that reviewers suggested.  Or to allow discussion.  Or something like that.

I'm curious if anyone has ever tried anything like this and whether it was successful or not.  It seems like if you can get a culture of this established, it could actually be quite useful.


  1. For tracking changes and comments, you could try to use bug tracker software, like Trac. This would allow anyone to submit 'issues' with the paper, with automatic notifications sent to the author. The only part that is (I think) not handled is emailing everyone about a new draft.

  2. @Twan: Yes, I forgot to mention that... we could use Trac or rememberthemilk or something like that... but it's not integrated with the paper management software (START or HotCRP), which I think it a big disadvantage: it creates enough additional overhead that maybe it would make the whole endeavor fail.

  3. Here in Brazil a similar structure is very common among the humanities, with the difference that since their publications are not deadline-driven, the reviewing is round-robin (one week everybody reads A's paper/thesis/project/proposal, the next everyone reads B's work, etc).

    Some groups in the sciences and in CS specifically do manage to implement something similar, but it's quite rare (as far as I know only one of the 10+ labs in Unicamp has this system), and there's a lot of inertia. I try pushing for something similar, as not only it motivates people to deliver earlier but also trains people to read their work (and their colleages' work) more critically, which really helps day-to-day lab discussions.

    Kudos for you if you make it work.

  4. What are the benefits of the conference management software?

    I would deal with it like a software project, with bug-tracking/issues for comments and source-control for versioning (so you get a diff of the .tex for free, and can set it up to email everyone on every commit).

  5. yeah, it's definitely helped me in writing papers. One fun idea I experimented with is having my friends in program analysis read my papers in NLP and I found that caught most of the things NLP people would spot (although they sometimes give field-specific advice, such as explicit contribution lists).

    I think you can encourage them with a carrot rather than a stick - they'll have a lower chance of rejection by doing this. It should work fine so long as it's motivated.

    I'm not so sure about a system for reviewing though; I prefer hard-copies so long as I can read their handwriting. Hard-copies can also take a lot less time for the review - just circle/bracket areas and say "awk", or draw contrasting links, etc.

    My last suggestion is that I've found extreme diminishing returns with multiple reviews. Something like 90% of the comments will overlap, so it ended up being a waste for me. Instead, reviewing different versions of the paper with different people can help because a person can only read it fresh once (similar to the reason we can't catch all of our own mistakes). I guess this goes back to hard-copies for me; non-digital is manageable because I do 1-2 people for the first phase, then (time permitting) 1-2 people for the second phase.

  6. @Alexandre: I hadn't thought of it, but you're right -- in a non deadline-driven field this would be way easier (in the sense that it seems like it would be the only reasonable way to work things).

    @Yoav: Hrm.... yeah, using something like Trac might work. I guess my worry is that I don't actually want to know about *all* commits -- I just want to know about "major" ones. Perhaps there's some software that'll let you do that...

    @Keith: Yeah, good points. I actually use a tablet so I can mark comments on papers and send electronic versions to students. This was probably the single best investment of my life :). (Ok not really.) Maybe low-tech is the right solution :).

  7. I like the idea of being a hard-ass the way you suggest, but I'm a terrible role model with regard to deadlines. Trying to improve, but old habits die hard. Anyway, something that might be useful to you and others in our community is latexdiff, http://www.ctan.org/tex-archive/support/latexdiff/, which takes old.tex and new.tex and generates a diff.tex that you can then pdflatex to see edits in a familiar Word-like "track changes" format. I just ran it on a dissertation today, to compare the defended version with the post-revision version, and it's a godsend.

  8. @hal, using git as your source control will allow for author(s) to commit locally as they go along, and push their work to a central repository once they have a stable version (bazaar should also work similarly, but I never tried). Or you can just go low-tech - use SVN and ask your students to email you once they commit a good version.

    those of us without a tablet find it helpful to print-mark-scan-email.

  9. nice post hal.

    in my research group we try to have a deadline 1-2 weeks in advance. we assign reviewers to the papers and each reviewer gets two papers. the reviewers write up real reviews and send them to the authors.

    we also have a big "program committee meeting" where we discuss the aspects of the papers that everyone can benefit from (writing tips, plotting, paper organization, etc.)

    i think its a good process. some nice effects are: first year students who may not have written any papers yet are involved in the writing process; authors who are mired in their papers take a break and read something new; and (of course) drafts are done a little early, early enough usually to run one more simulation or replot the results.

    i don't manage it with anything fancy, but it might be a good idea. (i've made mistakes...)

  10. @Philip: latexdiff has been really useful for me, but mostly as a high-level picture of which areas have changed. I find that people read a latexdiff output differently than the real thing. It's probably related to the recent stories about how poor fonts on handouts improved test scores.

  11. markup.io might be an interesting tool to try... eg: http://markup.io/v/km8djmtmc58f.

    maybe even in conjunction with a latex to html converter of some kind.

  12. I just instituted the 'week ahead of time' policy. And I think it works well because the deadline is way more relaxed, students actually get to see the last minute edits that happen, and I have more time to tell people what to change, rather than changing it myself.

    The "problem" in my group is that people work on different things and have different deadlines, so it's often hard to get content-focused reviews (mostly high level stuff), but I think I will institute some kind of "PC" for the papers.

    As for tracking, I've long been tempted to overlay some software bug tracking system over SVN for papers, especially when miniscule todos pile up towards the end. haven't figured out how to link it to my current SVN in a nice way. I don't think hotcrp is a useful solution for the reviewing though: it seems too heavyweight.

  13. To take the software metaphor a bit further, you are looking for code review software. Such a program enables reviewing specific changes in code. These programs are typically integrated with a version control system.
    Review Board is one such program. It integrates nicely with several version control systems. It is open source, so it may be extended to integrate with a paper management software.