Showing posts with label poll. Show all posts
Showing posts with label poll. Show all posts

04 January 2010

ArXiV and NLP, ML and Computer Science

Arxiv is something of an underutilized resource in computer science. Indeed, many computer scientists seems not to even know it exists, despite it having been around for two decades now! On the other hand, it is immensely popular among (some branches of) mathematics and physics. This used to strike me as odd: arxiv is a computer service, why haven't computer scientists jumped on it. Indeed, I spent a solid day a few months ago putting all my (well almost all my) papers on arxiv. One can always point to "culture" for such things, but I suspect there are more rational reasons why it hasn't affected us as much as it has others.

I ran in to arxiv first when I was in math land. The following is a cartoon view of how (some branches of) math research gets published:

  1. Authors write a paper
  2. Authors submit paper to a journal
  3. Authors simultaneously post paper on arxiv
  4. Journal publishes (or doesn't publish) paper
We can contrast this with how life goes in CS land:
  1. Conference announces deadline
  2. One day before deadline, authors write a paper
  3. Conference publishes (or rejects) paper
I think there are a few key differences that matter. Going up to the mathematician model, we can ask ourselves, why do they do #3? It's a way to get the results out without having to wait for a journal to come back with a go/no-go response. Basically in the mathematician model, arxiv is used for advertising while a journal is used for a stamp of approval (or correctness).

So then why don't we do arxiv too? I think there are two reasons. First, we think that conference turn around is good enough -- we don't need anything faster. Second, it completely screws up our notions of blind review. If everyone simultaneously posted a paper on arxiv when submitting to a conference, we could no longer claim, at all, to be blind. (Please, I beg of you, do not start commenting about blind review versus non-blind review -- I hate this topic of conversation and it never goes anywhere!) Basically, we rely on our conferences to do both advertising and stamp of approval. Of course, the speed of conferences is mitigated by the fact that you sometimes have to go through two or three before your paper gets in, which can make it as slow, or slower than, journals.

In a sense, I think that largely because of the blind thing, and partially because conferences tend to be faster than journals, the classic usage of arxiv is not really going to happen in CS.

(There's one other potential use for arxiv, which I'll refer to as the tech-report effect. I've many times seen short papers posted on people's web pages either as tech-reports or as unpublished documents. I don't mean tutorial like things, like I have, but rather real semi-research papers. These are papers that contain a nugget of an idea, but for which the authors seem unwilling to go all the way to "make it work." One could imagine posting such things on arxiv. Unfortunately, I really dislike such papers. It's very much a "flag planting" move in my opinion, and it makes life difficult for people who follow. That is, if I have an idea that's in someone elses semi-research paper, do I need to cite them? Ideas are a dime a dozen: making it work is often the hard part. I don't think you should get to flag plant without going through the effort of making it work. But that's just me.)

However, there is one prospect that arxiv could serve that I think would be quite valuable: literally, as an archive. Right now, ACL has the ACL anthology. UAI has its own repository. ICML has a rather sad state of affairs where, from what I can tell, papers from ICML #### are just on the ICML #### web page and if that happens to go down, oh well. All of these things could equally well be hosted on arxiv, which has strong government support to be sustained, is open access, blah blah blah.

This brings me to a question for you all: how would you feel if all (or nearly all) ICML papers were to be published on arxiv? That is, if your paper is accepted, instead of uploading a camera-ready PDF to the ICML conference manager website, you instead uploaded to arxiv and then sent your arxiv DOI link to the ICML folks?

How do you feel about arxiving ICML?
No, please don't put my paper on arxiv.
I'm happy to have my paper on arxiv, but you should do it for me!
I'm happy to upload my paper to arxiv.

Obviously there are some constraints, so there would need to be an opt-out policy, but I'm curious how everyone feels about this....