15 May 2007

Whence JCLR?

Journal publication is not too popular for NLPers -- we tend to be a conference driven bunch. While I could care less about some arguments for journals (eg., the folks on tenure committees like them), I do feel that they serve a purpose beyond simply acting as an archive (which things like arxiv.org, the ACL anthology, citeseer, rexa, etc. do anyway). In particular, a journal paper is often a place where you get to really set the stage for your problem, describe your algorithms so that they're actually reimplementable, and go in to serious error analysis. Certainly not every paper that appears in a *ACL should continue on to the journal path, but many times a handful of papers could be merged.

One significant problem is that we're currently really limited in our choice of publication venues. Computational Linguistics (MIT Press) is definitely the place to publish a journal paper if you can. Unfortunately, CL only puts out four issues per year, each with about 4-5 papers. Sure there aren't hundreds of good papers per year, but I have to believe there are more than 16-20. Moreover, I don't feel that CL actually mirrors the *ACL proceedings -- there are many papers published in CL that I don't think match with the general sensitivities of *ACL. In addition to the small number of CL papers, the turn around time is quite slow. I was personally very impressed with my turnaround time two years ago (date of submission -> date of publication was about a year) and I know that Robert Dale (who's editing now) has done a lot to try to improve this. But still, a year is a long time. And I've heard of papers that take several years to get through. Finally, it's not open. I hate pay-for-access journals almost as much as I had pay-for-access conference proceedings. Sure, if you attend an *ACL you get it for "free" and most universities have agreements, but this is more a principle thing than a practical thing.

Things were similar in machine learning land about six years ago (though in fact I think they were worse). The big journal there was Machine Learning (published by Springer). They had roughly the same problems, to the extent that a large fraction of the editorial board resigned to found the Journal of Machine Learning Research (JMLR). JMLR has since become very successful, publishes dozens of papers per year, and has incredibly quick turnaround (I have seen a journal version of a NIPS paper appear in JMLR before NIPS even happens). The creation of JMLR was greatly assisted by the SPARC group, which helps fledgling journal get off the ground.

I would love to see a similar thing happen in the NLP community. I, personally, cannot make this happen (I don't have enough weight to throw around), but in talking to colleagues (the majority of whom also don't have enough weight) this seems to be something that many people would be in favor of. I don't think it has to be a "JCLR is better than CL" sort of thing; I think it's very possible for both to co-exist, essentially serving slightly different purposes for slightly different communities. In particular, aside from fast turnaround and online pubs, some things that I would love to see happen with such a journal are: Strongly encouraged sharing of code/data (if one could build in some sort of copyright protection for private data, this would be even better since it would let more people share); and a built-in board for paper discussion (probably with membership); ability for authors to easily submit addenda.

A while back I went through the SPARC suggestions of how to begin such a thing and it's very non-trivial. But it's doable. And I'd be willing to help. The biggest thing that would be required would be a bunch of people with white hair who are willing to commit body-and-soul to such a move.


Anonymous said...

Hal wrote:

"Strongly encouraged sharing of code/data (if one could build in some sort of copyright protection for private data, this would be even better since it would let more people share); and a built-in board for paper discussion (probably with membership); ability for authors to easily submit addenda."

The sharing code/data is interesting. I know journals like Bioinformatics get authors to publish the public location of any systems they are describing. However, since grad students are often responsible for maintaining the code, the associated web addresses often disappear as the students move on. Any online journal that encourages this should also provide the space to store it or force authors to use sourceforge or other permanent locations.

I really believe in an online discussion forum per paper. Anyone who has read an Annals of Statistics paper has surely enjoyed the solicited expert opinions and author responses -- they are often better than the original paper!

Anonymous said...

This just came through the mail from Priscilla. Maybe you should write something? :-)

"The ACL executive committee realizes that a number of initiatives of ACL members have led to the creation of very successful resources for our community. Such examples include the ACL Anthology, the Software Registry, NLTK, the ACL Wiki, etc.

We would like to encourage our members to consider other projects of similarly high value. To this end, we are soliciting short (1-2 pages, in plain text) proposals from ACL members. These should be submitted to radev@umich.edu by June 4, 2007.

Proposals will be judged by a subcommittee of the ACL exec based on the following criteria:

- novelty
- potential impact on the ACL community: e.g., who will be using the resource
- collaboration opportunitities: e.g., building a resource by joint
work from the community (a good example being the Wiki)
- sustainability: e.g., keeping the resource up to date with no
involvement from the exec

A small number of proposals (2-3) will be selected and their authors
will be given an opportunity to present them briefly (5 minutes) at
the business meeting in Prague. The purpose of these presentations will be to gauge community interest and recruit volunteers for the actual implementation.

The ACL exec will work with the proposers on the logistics of the
selected proposals. This may involve some small amounts of seed money to launch the projects.

Dragomir Radev, ACL secretary"

Mark Dredze said...

When you have a successful ACL paper and want to publish further results, where do you go? ACL/EMNLP/NAACL/CONLL/ETC. are nice because the papers are relatively short and to the point, but this means there aren't enough details for the really good papers. Sometimes you can turn to JMLR, but not for a lot of the ACL work. The CL journal has a different focus. Something is really needed in this area similar to JMLR.

I agree with Fernando and you about this. Why do you need someone "with gray hair"? I would say that as long as these people are supportive, then they don't need to be the leads on the project. I think the reaction to such an effort would be very positive from the type of people whom you need support. Beyond that, the work can be done by whomever has time and is willing. Senior people can advise if needed. Ryan's suggestion of responding to Priscilla's message is a good idea. I think you'd find other people willing to help with such an effort at ACL through this process. Once you find a group of interested people, hopefully someone with time can emerge to lead the effort.

Kevin Duh said...

I agree. You don't need gray hairs to do this. In fact, initiatives in all walks of life are often carried out by young folks who think non-traditionally; we just need endorsement/support from the more senior people. My sense is that ACL is a very open-minded community and this issue will definitely be put on the discussion table if someone brings it up. I'd be happy to help in any capacity (e.g. grunt work) if you decide to pursue this, Hal!

Anonymous said...

Journal versions of nlp papers would be great and there'll be no need to try and publish in linguistic/psychology/cognitive journals which are clearly not the most suitable place for computational discussion.

I'm with Hal about the gray hair. I hate to admit that gray hair is needed for the "political" reasons, moreover, not only gray hair has the weight they also have more time.

hal said...

wow, lots of support :). i think ryan's absolutely right that there would need to be a repository somewhere for code/etc. the only problem is that sometimes code gets licensed and then you don't want it up in some place you don't have control over. it seems like this is a solvable problem, though. would just need some more thought.

it looks like people are in favor of submitting this as an idea to the ACL. i would be happy to do this. if you're interested in supporting (putting your name on the proposal i send in) or helping edit the proposal, i will construct a wiki page for some discussion. check back here later for a url. (should be up tonight.)

regarding hair: i feel that what such people lack in melanin (or follicles, as fernando points out) they (more than) make up for in experience and reputation. i feel that having senior people is important for two orthogonal reasons. one is to grant stature. something like JCLR will be successful only if people submit good papers to it. people are more likely to do this if they believe that important people are behind it. second, as a editorial board. youngsters like us are great at getting things done, but senior people tend to have a much better grasp of the big picture (compare paper reviews written by students to those written by full profs and you'll get a shocking wake-up call). i'm fully content to admit that i don't have nearly enough experience to see the full picture.

now, that's not to say that senior people need to run such a beast, but they at least need to be on board. and they need to be on board more than just nominally. they need to be willing to serve to help set direction, and willing to review.

Anonymous said...

I'm all in favor of trying new things every so often just to see how they turn out, but I suspect the main question is whether there are papers that aren't getting published right now that really ought to be. I'm not convinced that there are, but I'd be pleased to be shown to be wrong. There are already a number of CL-ish journals besides Computational Linguistics, such as Computer Speech and Language, and Grammars, and a couple of others I can't remember right now. Generally these all seemed like good ideas when they were proposed and they have published really important papers, but they haven't become "must reads" in the computational linguistics community. I suspect that the problem isn't the hair color of the editoral board.

Anonymous said...

Mark writes:

"are papers that aren't getting published right now that really ought to be. I'm not convinced that there are"

You might be right. Given the number of conferences we have, I believe most good work gets a venue. However, I also believe much of the best work at our conferences never makes it past the 8 page version simply because there is no incentive: nobody reads many of our current journals, or there is a year long publishing cycle for those journals that people do read. An online journal such as this should give researchers the opportunity to quickly publish extended versions w/ more analysis, experiments and implementation details, all of which are usually condensed or eliminated because of the conference page limit. (I guess this is just re-stating what Mark already said above).

Also, Computer Speech and Language, Grammars, Research on Language and Computation, Journal of Natural Language Engineering, etc. are all not open access journals. (I think, but I may be wrong about one or two). Biology has many open access journals that in a few short years had impact factors near or above their traditional subscription based counterparts. JMLR has the highest impact factor for any AI journal. So I think there is a lot of evidence that quick publication open access venues get good submissions that are read and cited frequently.

I think the primary advantage of what Hal is proposing is that it is open access and will provide modern means of evaluating the work like discussion forums and code repositories. None of which are part of any of our current means of publication.

hal said...

I have set up a wiki page to discuss this. To stop spam you need an account to edit (email me for one). I'd love to get as much feedback as possible. If you add a comment, it would be useful (but not necessary) for you to add your name next to it.

Anonymous said...

I agree with many of the sentiments that have been expressed here. Certainly, an open access CL journal would be a good thing. However, there is a much simpler, and likely superior, method to generating such a thing: convert CL to an open access format. Coincidentally, I had been thinking of proposing just such a move to the ACL board as part of their "solicited projects" effort.

Converting CL to open access has many positive aspects. First, it doesn't proliferate journals, which Mark Johnson has noted is not clearly desirable. Second, it already has an established "brand" as the leading CL journal. Third, if it does turn out that there are more papers of CL quality than the current CL can publish with its traditional (page-limited issues, with a backlog for buffering) publishing mechanism, moving to open access (papers issued as accepted with no backlog) will allow more papers to be published in the journal. (JMLR, for instance, has increased its page count by a factor of about two over the last few volumes.)

CL is especially well-suited for being run as an open access journal. Editorial functions are already handled by a volunteer group of academics. Authors are already used to doing their own copy-editing and typesetting. Everyone involved is computer-savvy. And we are all aware of the ancillary benefits of having text openly available for all uses, not just the reading of articles but, for instance, using them as raw material for experimental language processing. It would be interesting to get Robert Dale's view on what MIT Press is contributing in terms of the running of the journal that would still be needed if the journal were to convert.

You may wonder why the editorial board of Machine Learning didn't take the same route. The answer is that it wasn't open to them. This was a Springer journal, owned and run by a commercial publisher, who owned in particular the most valuable asset of the journal, its brand name. For that reason, the editorial board had to set up a competing journal under a different name. But as far as I know, ACL owns the CL journal name and brand, and could decide to take it open access unilaterally.

Hal worries that running an open access journal is "very nontrivial". In fact, it should be no more difficult than running a closed access journal, which the CL editorial board already does. JMLR and JAIR provide the models. (Contra Hal's assumption, I don't think that SPARC helped JMLR that much in the initial setup. They are an excellent group though.)

One terminogical quibble: The term "online journal" is a misnomer. All journals these days, including CL, are available online (though sometimes only to subscribers), and open access journals are often available in print form. The issue is not the mode of distribution of the bits (over a network or in trucks) but the nature of access limitations (open or closed). JMLR, for instance, is available in a print edition from Microtome Publishing (which I run). I'd be happy to offer Microtome's services to run the print edition of an open access CL if MIT Press no longer desires to. (They did JMLR's first four print volumes but decided to break their ties to JMLR at that point.) For more information on print and open access, see here.

I'd love to help out with a proposal to convert CL to open access. Am I gray-haired enough?

hal said...

i think what stuart says makes a ton of sense. i think it's better for the community if the possibility of opening up CL and simultaneously modernizing it is possible. my opinion is that we should push for this. if it meets with strong resistance, we can revisit the idea of creating something new.

that said, i still encourage people to visit the wiki page and add comments. i think that many of the things under "differentiation" are things that would be great to incorporate into a repolished CL. IMO these should be part of a proposal.

i'm tempted to suggest that we (we=those that have interested in this) band together and put in a proposal at ACL to turn CL into an OA journal with all of the advancements such as reccomendations, discussion boards, etc.

does anyone think this is a bad proposal? the only disadvantage i see is that if it gets smacked down then we probably won't have an open (in person) forum like that offered at ACL this year again for a while. (wow, i'd love to see a parser deal with that sentence!)

Unknown said...

There are a number of considerations around at the moment that might cause us to review the kind of existence that the Computational Linguistics journal has. This is likely to be a major discussion item at the editorial board meeting at this year's ACL, so all the comments raised here are very useful input to that discussion. I do think there is likely to be some value in keeping separate (a) the perceived problems that need addressing (not enough publication fora? too slow reviewing times? etc) and (b) the models being proposed as solutions (open access? online?).

I'll follow-up on Stuart's request for info on what contribution MIT Press makes when I have enough spare minutes to provide a reflective response.


Drago said...

When I joined the exec last year as secretary, I made it clear that I will be actively seeking new initiatives for improving the organization, in particular with respect to publications. As Ryan points out, we now have an (ACL-exec approved) call for cool ideas and initiatives that the exec will consider at our next meeting. If you guys (Ryan, Fernando, Hal, ...) would put up a short proposal (1 page of plain text is enough) detailing your concerns and suggestions, we will make sure to consider them seriously. Not that the acl exec hasn't been already discussing this issue for months :))

Min said...

I tend to agree with Stuart also on the importance of keeping the CL brand name while ensuring that CL papers don't end up auto-archived as Julia mentioned in her email on the wiki.

While it may be that most good quality NLP articles are finding a suitable venue (CL or not), opening CL to encompass a faster review time would allow graduate students about to graduate to have more of a chance of having their results published in a journal within a reasonable lifespan. Currently, most NLP journals don't really support the type of fast review times that would enable this to happen (unlike J(AI|ML)R). If students are graduating and not continuing on their thesis topic, I think the ACL/EMNLP papers end up being the finish rather CL as their student lifetime expires.

I think this is a too bad really, as quite a lot of high quality student work could go into CL if not for this problem.

Anonymous said...

Slow turn-around is perhaps the most important reason for why journals don’t play a bigger role in our field, and a big part of this is the reviewing process. Computational Linguistics has been trying to do something about this for years, but journals have little leverage over reviewers. The problem is that it’s hard work to review a journal article properly (that’s why I’m procrastinating by writing this blog entry instead of doing my CL review due on Monday). I think a journal article review should take a lot of work: a journal article is supposed to cover all the details, and if the reviewers are doing their job they should check all these details. But that hard reviewing work is essentially uncompensated. (For example, I could be spending this weekend working on a NIPS submission which could actually have an impact professionally).

So maybe we should think about ways of compensating reviewers. Reviewing is service to the field and so does count for tenure and the like, but is weighted far below research and teaching. Perhaps CL could try to more prominently thank reviewers who have done a good job? Right now we publish a list of all reviewers; maybe we could also say how many articles each reviewer reviewed? Maybe also whether they did it on time? Even more controversially, I think Geoff Pullum once suggested that reviewers names appear along with authors names on published articles; that way they get some of the reflected glory from important articles they reviewed (and take some of the blame for errors that get through to publication). And most controversially of all, maybe we should pay reviewers? After all, publishers typically offer an honorarium for book proposal reviews, and these go/no-go reviews are usually much less effort than a detailed journal review.

Fernando Pereira said...

@Mark Johnson: How would you explain the turn-around of top bio journals? Are biomedical researchers less busy than we are? Think about your annual reviewing "budget". How many hours do you spend reviewing for ACL, EMNLP, NIPS, ICML? Add them up. How many journal reviews in there?

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

gamefan12 said...

I think this is so good to use. I think there is a future for this. I want to see this more.
orlando dui law