natural language processing blog: ArXiV and NLP, ML and Computer Science

04 January 2010

ArXiV and NLP, ML and Computer Science

Arxiv is something of an underutilized resource in computer science. Indeed, many computer scientists seems not to even know it exists, despite it having been around for two decades now! On the other hand, it is immensely popular among (some branches of) mathematics and physics. This used to strike me as odd: arxiv is a computer service, why haven't computer scientists jumped on it. Indeed, I spent a solid day a few months ago putting all my (well almost all my) papers on arxiv. One can always point to "culture" for such things, but I suspect there are more rational reasons why it hasn't affected us as much as it has others.

I ran in to arxiv first when I was in math land. The following is a cartoon view of how (some branches of) math research gets published:

Authors write a paper
Authors submit paper to a journal
Authors simultaneously post paper on arxiv
Journal publishes (or doesn't publish) paper

We can contrast this with how life goes in CS land:

Conference announces deadline
One day before deadline, authors write a paper
Conference publishes (or rejects) paper

I think there are a few key differences that matter. Going up to the mathematician model, we can ask ourselves, why do they do #3? It's a way to get the results out without having to wait for a journal to come back with a go/no-go response. Basically in the mathematician model, arxiv is used for advertising while a journal is used for a stamp of approval (or correctness).

So then why don't we do arxiv too? I think there are two reasons. First, we think that conference turn around is good enough -- we don't need anything faster. Second, it completely screws up our notions of blind review. If everyone simultaneously posted a paper on arxiv when submitting to a conference, we could no longer claim, at all, to be blind. (Please, I beg of you, do not start commenting about blind review versus non-blind review -- I hate this topic of conversation and it never goes anywhere!) Basically, we rely on our conferences to do both advertising and stamp of approval. Of course, the speed of conferences is mitigated by the fact that you sometimes have to go through two or three before your paper gets in, which can make it as slow, or slower than, journals.

In a sense, I think that largely because of the blind thing, and partially because conferences tend to be faster than journals, the classic usage of arxiv is not really going to happen in CS.

(There's one other potential use for arxiv, which I'll refer to as the tech-report effect. I've many times seen short papers posted on people's web pages either as tech-reports or as unpublished documents. I don't mean tutorial like things, like I have, but rather real semi-research papers. These are papers that contain a nugget of an idea, but for which the authors seem unwilling to go all the way to "make it work." One could imagine posting such things on arxiv. Unfortunately, I really dislike such papers. It's very much a "flag planting" move in my opinion, and it makes life difficult for people who follow. That is, if I have an idea that's in someone elses semi-research paper, do I need to cite them? Ideas are a dime a dozen: making it work is often the hard part. I don't think you should get to flag plant without going through the effort of making it work. But that's just me.)

However, there is one prospect that arxiv could serve that I think would be quite valuable: literally, as an archive. Right now, ACL has the ACL anthology. UAI has its own repository. ICML has a rather sad state of affairs where, from what I can tell, papers from ICML #### are just on the ICML #### web page and if that happens to go down, oh well. All of these things could equally well be hosted on arxiv, which has strong government support to be sustained, is open access, blah blah blah.

This brings me to a question for you all: how would you feel if all (or nearly all) ICML papers were to be published on arxiv? That is, if your paper is accepted, instead of uploading a camera-ready PDF to the ICML conference manager website, you instead uploaded to arxiv and then sent your arxiv DOI link to the ICML folks?

Obviously there are some constraints, so there would need to be an opt-out policy, but I'm curious how everyone feels about this....

22 comments:

Benoit Essiambre said...: What about CiteSeer?; 04 January, 2010 12:32
Bob Carpenter said...: Could you be a little more specific on the boundary you perceive between "flag planting" and "making it work"?

I'm all in favor of the Arxiv plus (open source) journal approach. I think it represents the best combination of (a) timeliness of sharing, (b) flag planting [no longer being an academic, it doesn't matter so much for me], and (c) long-term quality.

Conferences are no longer the fastest route to sharing, are always a pain to justify outside of CS for tenure and promotion cases, and given the pressure on writers and reviewers Hal notes, not prone to produce as high quality a result as a journal article (or even a tech report). Conference paper length bounds are also problematic for ideas that are smaller or larger than the de facto minimal publishable unit of 8 pages.

I love conferences. I'd just prefer they relinquish the role of gatekeeper and embrace the role of community builder. They used to work more that way in speech, though I hear they're tightening their acceptance rates, which I'm also told looks good on academic CVs.

I'll let you draw your own conclusions about the role of single- or double-blind reviewing and their feasability or desirability in an Arxiv/journal world.; 04 January, 2010 12:55
Drago said...: arxiv has had for a really long while (e.g., since 1994?) a cmp-lg component (now cs.CL):

http://arxiv.org/list/cs.CL/recent; 04 January, 2010 13:25
Anonymous said...: Playing devil's advocate, what is the value of arXiv when I can just link PDF to my web page and count on Google to rapidly index it?; 04 January, 2010 14:37
hal said...: @Bob: Well, if I were a finger pointing type this would be a lot easier. As a very small example, take my topic models on a graph paper. The idea is really straightforward. I tried to make it work for a long time until I realized that it just plain didn't work unless you put edge weights on the graph that could be inferred. I could easily have put up a tech report with the original idea, but there's no way I could have known that to "make it work" you'd really need to deal with edge weights properly. Then, on top of that, there's all the effort of getting the data, writing the code, debugging, blah blah blah.

drago: sure, but hardly anyone uses it!

regehr: permanence. your web page might change url, you might move, or retire or whatever, but your papers should live on! plus, there are lots of people (mostly senior :P) who don't bother updating their web pages anymore, and then if the ICML web page goes down, hasta la vista paper!; 04 January, 2010 15:43
VJ said...: About your comment on tech-reports: if we ignore the credit-related issues, these reports can be seen as small pieces of unfinished research that could help another person's research. For instance, if a researcher made some observations while performing an experiment, which he/she is not able to explain from a theoretical standpoint, perhaps due to a lack of background/resources. In this situation, do you think it would be reasonable to put those results into a tech-report?; 04 January, 2010 18:04
Ricardo Silva said...: The proceedings of ICML have been archived in the ACM Digital Library for quite a while now, and they all get a ISBN. (I've been publication chair for this conference a couple of times, and I was surprised that ACM does check in detail those permission-to-publish forms!)

Recently, it appears that full mirrors for the conference websites have been made available at the IMLS website. Certainly nothing like the books.nips website, but that's a start. UAI has recently started a major effort to make the whole series of proceedings available electronically, and perhaps ICML will follow.

I remember several conversations in the past about making ICML papers automatically available at arxiv, but it seems quite a pain: we either have to format all source files in a way it can be compiled by arxiv (hard to centrally coordinate with the authors), or we have to make a case for them to accept our PDFs (I have no idea how rigid they can be on rejecting PDFs. Perhaps somebody else can clarify).; 06 January, 2010 16:17
rr8004 said...: Very nice information. Thanks for this. Please come visit my site Directory Aurora City when you got time.; 10 January, 2010 00:39
rr8004 said...: Very nice information. Thanks for this. Please come visit my site Aurora Business Services And Classifieds when you got time.; 10 January, 2010 00:39
Anonymous said...: Certainly nothing like the books.nips website, but that's a start.

Affordable SEO Services; 12 January, 2010 07:56
Robin said...: This is the first time I know about ArXiV. Thanks for the info.; 12 January, 2010 09:15
Rachel Bates said...: How about these science cartoons?

There are many good ones on Vadlo search engine http://vadlo.com/cartoons.php?id=1.; 15 January, 2010 14:52
phrasal verbs in English said...: Well i think computer have developed lots of interesting languages so that those languages are really very beneficial for the development. So i am really impressed to know about it.; 19 January, 2010 10:49
State of Mississippi Lawyer Directory said...: Me & my neighbour were preparing to do some research about that. We got a good book on that matter from our local library and most books where not as influensive as your information. I am very glad to see such information which I was searching for a long time.

Mississippi Attorneys, Mississippi
Lawyers, Mississippi Law Firms,
Mississippi Law Offices, Mississippi
Legal Services, Attorneys
In Mississippi, Mississippi Lawyer
Directory, Mississippi Attorney
Directory, Mississippi Accident Attorneys, Mississippi Administrative & Governmental Law Attorneys, Mississippi Adoption Attorneys, Mississippi Agricultural Law Attorneys, Mississippi Appeals Attorneys, Mississippi Arbitration & Mediation Services, Mississippi Arbitration & Mediation Services Attorneys, Mississippi Asbestos Diseases Attorneys, Mississippi Asset Protection Attorneys, Mississippi Attorneys, Mississippi Attorneys&#; Information & Referral Services, Mississippi Attorneys&#; Support Services, Mississippi Banking & Investment Law Attorneys, Mississippi Bankruptcy Attorneys, Mississippi Business Services, Mississippi Child Abuse Law Attorneys; 29 January, 2010 02:18
Attorneys Missouri said...: I just couldnt leave your website before saying that I really enjoyed the quality information you offer to your visitors... Will be back often to check up on new stuff you post here!

Missouri Attorneys, Missouri
Lawyers, Missouri Law Firms,
Missouri Law Offices, Missouri
Legal Services, Attorneys
In Missouri, Missouri Lawyer
Directory, Missouri Attorney
Directory, Missouri Accident Attorneys, Missouri Administrative & Governmental Law Attorneys, Missouri Adoption Attorneys, Missouri Agricultural Law Attorneys, Missouri Appeals Attorneys, Missouri Arbitration & Mediation Services, Missouri Arbitration & Mediation Services Attorneys, Missouri Asbestos Diseases Attorneys, Missouri Asset Protection Attorneys, Missouri Attorneys, Missouri Attorneys&#; Information & Referral Services, Missouri Attorneys&#; Support Services, Missouri Banking & Investment Law Attorneys, Missouri Bankruptcy Attorneys, Missouri Business Services, Missouri Child Abuse Law Attorneys; 29 January, 2010 02:19
Anonymous said...: cheap nike shox
cheap sport shoes
nike tn dollar
ed hardy ugg boots
ed hardy love kills slowly
ed hardy clothing us
ed hardy clothing
cheap ed hardy
cheap ed hardy clothing
ed hardy clothes
ed hardy wholesale
ed hardy clothing
ed hardy t shirts
ed hardy shirts
ed hardy uk
ed hardy t shirts
ed hardy shirts
ed hardy hoodies
Cheap JORDAN SHOES，，
cheap nike max ，。
puma future cat
ed hardy ugg boots.
ed hardy love kills slowly boots.
ed hardy love kills slowly.
ed hardy polo shirts.
cheap ed hardy clothing,.
ed hardy shirts .
ed hardy t shirts.,.; 13 April, 2010 20:01
Unknown said...: Nice experience shared. Its not less than an interview. Great way of posting such good and informative stuff.
New Jersey Attorney Yellow Pages, Attorneys
New Jersey , New Jersey Corporate Business Attorneys, New Jersey Corporate Finance & Securities Attorneys, New Jersey Creditors&#; Rights Attorneys, New Jersey Criminal Law Attorneys, New Jersey Custody & Support Law Attorneys, New Jersey Debt Consolidation Attorneys, New Jersey Disability Law Attorneys; 11 May, 2010 03:10
Anonymous said...: Good writing. Keep up the good work. I just added your RSS feed my Google News Reader..
Connecticut title insurance, Delaware title insurance, District Of Colombia title insurance, Florida title insurance, Georgia title insurance, Hawaii title insurance, Idaho title insurance, Illinois title insurance, Indiana title insurance, Iowa title insurance, Kansas title insurance, Kentucky title insurance; 14 May, 2010 00:11
Anonymous said...: Well Whattadya know, yet another great site to add to my reader! Google blog search has you pretty well indexed it seems! you have some brilliant contents!
Louisiana title insurance, Maine title insurance, Maryland title insurance, Massachusetts title insurance, Michigan title insurance, Minnesota title insurance, Mississippi title insurance, Missouri title insurance, Montana title insurance, Nebraska title insurance, Nevada title insurance,; 14 May, 2010 00:11
combattery84 said...: Laptop battery
ACER Laptop Battery
ASUS Laptop Battery
COMPAQ Laptop Battery
Dell Laptop Battery
HP Laptop Battery
IBM Laptop Battery
SONY Laptop Battery
TOSHIBA Laptop Battery
APPLE M8403 battery
APPLE A1078 Battery
APPLE A1079 battery
APPLE A1175 battery
APPLE a1185 battery 1
APPLE A1189 battery
Acer aspire 5920 battery
Acer btp-arj1 battery
Acer LC.BTP01.013 battery

Acer ASPIRE 1300 battery
Acer ASPIRE 1310 battery
Acer Aspire 1410 battery
Acer ASPIRE 1680 battery
ACER BTP-63D1 battery
ACER BTP-43D1 battery
Acer lc.btp05.001 battery
Acer aspire 3000 battery
Acer Travelmate 4000 battery
ACER aspire 5560 battery
ACER BATBL50L6 battery
ACER TravelMate 240 Battery
ACER BT.00803.004 Battery
ACER Travelmate 4002lmi battery
Acer travelmate 800 battery; 05 July, 2010 00:01
Pandora bracelets said...: For those defective to buy Pandora see after today's announcement, the band will still not a bargain. Beads Bangles Now the band says that "substantially advanced" day duty are Pandora Bangles Sale being booked for the fourth billet and 2005. Beads bracelets The reports is so good that three brokerage firms raised their rating buy pandora bracelets on justness (ROE) to the keep selling at $34.75 a split, pandora necklace beads two cents in this year and $0.90 a divide next year. Pandora necklace sale While the newscast; 05 July, 2010 23:47
combattery84 said...: ACER Travelmate 4002lmi battery
Acer travelmate 800 battery
Acer aspire 3613wlmi battery
Travelmate 2414wlmi battery
Acer batcl50l battery
Acer Travelmate 2300 battery
ACER aspire 3610 battery
ACER travelmate 4600 battery
Dell Latitude D800 battery
Dell Inspiron 600m battery
Dell Inspiron 8100 Battery
Dell Y9943 battery
Dell Inspiron 1521 battery
Dell Inspiron 510m battery
Dell Latitude D500 battery
Dell Latitude D520 battery
Dell GD761 battery
Dell NF343 battery
Dell D5318 battery
Dell G5260 battery
Dell Inspiron 9200 battery
Dell Latitude C500 battery
Dell HD438 Battery
Dell GK479 battery
Dell PC764 battery
Dell KD476 Battery
Dell Inspiron 1150 battery; 07 July, 2010 00:16

How do you feel about arxiving ICML?
	No, please don't put my paper on arxiv.
	I'm happy to have my paper on arxiv, but you should do it for me!
	I'm happy to upload my paper to arxiv.

natural language processing blog

04 January 2010

ArXiV and NLP, ML and Computer Science

22 comments:

About Me

Labels

My Blog List

Blog Archive