10 March 2007

Reproducible Results

In an ideal world, it would be possible to read a paper, go out and implement the proposed algorithm, and obtain the same results. In the real world, this isn't possible. For one, if by "paper" we mean "conference paper," there's often just not enough space to spell out all the details. Even how you do tokenization can make a big difference! It seems reasonable that there should be sufficient detail in a journal paper to achieve essentially the same results, since there's (at least officially) not a space issue. On the other hand, no one really publishes in journals in our subfamily of CS.

The next thing one can do is to release the software associated with a paper. I've tried to do this in a handful of cases, but it can be a non-trivial exercise. There are a few problems. First, there's the question of how polished the software you put out should be. Probably my most polished is megam (for learning classifiers) and the least polished is DPsearch (code from my AI stats paper). It was a very nontrivial amount of effort to write up all the docs for megam and so on. As a result, I hope that people can use it. I have less hope for DPsearch --- you'd really have to know what you're doing to rip the guts out of it.

Nevertheless, I have occasionally received copies of code like my DPsearch from other people (i.e., unpolished code) and have still been able to use them successfully, albeit only for ML stuff, not for NLP stuff. ML stuff is nice because, for the most part, its self-contained. NLP stuff often isn't: first you run a parser, then you have to have wordnet installed, then you have to have 100MB of data files, then you have to run scripts X, Y and Z before you can finally run the program. The work I did for my thesis is a perfect example of this: instead of building all the important features into the main body of code I wrote, about half of them were implemented as Perl scripts that would essentially add "columns" to a CoNLL-style input format. At the end, the input was like 25-30 columns wide, and if any were missing or out of order, bad things would happen. As a result, it's a completely nontrivial exercise for me to release this beast. The only real conceivable option would be to remove the non-important scripts, get the important ones back into the real code, and then release that. But then there's no way the results would match exactly those from the paper/thesis.

I don't know of a solution to this problem. I suppose it depends on what your goal is. One goal is just to figure out some implementation details so that you can use them yourself. For this, it would be perfectly acceptable in, say, my thesis situation, to just put up the code (perhaps the scripts too) and leave it at that. There would be an implicit contract that you couldn't really expect too much from it (i.e., you shouldn't expect to run it).

A second goal is to use someone else's code as a baseline system to compare against. This goal is lessened when common data is available, because you can compare to published results. But often you don't care about the common data and really want to see how it works on other data. Or you want to qualitatively compare your output to a baseline. This seems harder to deal with. If code goes up to solve this problem, it needs to be runnable. And it needs to achieve pretty much the same results as published, otherwise funny things happen ("so and so reported scores of X but we were only able to achieve Y using their code", where Y < X). This looks bad, but is actually quite understandable in many cases. Maybe the solution here is, modulo copyright restrictions and licensing problems (ahem, LDC), just put up you models output as well. This doesn't solve the direct problem, but maybe helps a bit. It also lets people see where you model screws up, so they can attempt to fix those problems.

10 comments:

oren said...

The problem, as you state in your last sentence, is "It also lets people see where you model screws up"...

Now seriously, most of the times, even when you start coding in a very structured way after careful design and you even put time in documentation - you get (bad) results and then just fix one small thing, then you change something else that might improve results. then you realize that another pre-processing step might be needed so you write a quick script (perl, python whatever) to do that. Submission date is approaching and you really don't have the time to bundle it all together....
Eventually, not only that the code is not publishable, it is even self re-use takes its time till you figure out what needs to be done...

Bob Carpenter said...

The problem with "research" software distro mainly derives from one single factor: lack of automation. Take Hal's anecdote of "pre"-processing CoNLL data with cumulative Perl scripts, or Oren's comment that sometimes you make quick local changes. As long as there is a single (one-touch) top-level script/makefile that runs everything, a diligent reader can trace through where the code's going. And there's no way for the columns to get out of order, either.

The second biggest factor that will help with software distribution is for people to write more readable code. Not more comments. I don't care if there are any comments in code if it's written to be read. That means variable names that are consistent and make sense, code broken down into subroutines with reasonable names, etc. Comments get out of date as often as they're useful in living code.

The third issue is learning to read software. Researchers don't get much practice at this, as they're usually writing their own one-off software rather than having to write re-usable software as part of a group. Sasha Caskey taught me to read code while we were integrating JavaScript into SpeechWorks' semantic parser; pair programming is a great way to learn this kind of thing (and also forces you to write code more cleanly, too).

I'd recommend everyone pick up a copy of Hunt and Thomas's "The Pragmatic Programmer", read it, and follow their advice in your next project. Beck's "Extreme Programming" is also worth a read. Both are highly applicable to research programming.

As a Java example in NLP, check out our BioCreative and CoNLL submissions in LingPipe's CVS sandbox.

For a C++ example in collaborative filtering, check out Timely Development's Netflix code, which is an online SVD algorithm handling missing data. It's reconstructed from Simon Funk's algorithm sketch based on Genevieve Gorrell's paper. Ironically, I find Timely's code the easiest of all three sources to read.

P.S. This same thread just made the rounds at the Nodalpoint bioinformatics blog.

nikita said...

i m a student of masters in computer science ..i m interested in doing to create a compiler for natural langauges like "english" langauge ..is it possible?? are there any more interesting researches to be done in NLP ?? pl help me
nikita

car amplifier said...

A car amplifier will give you a loud and clear sound on a consistent basis. It will boost the power flowing from the

receiver to the speakers. In doing so, it will reduce the stress put on all the other components of your car stereo

system, including the receiver.

Choosing the right car amplifier is important. Your decision should be based on five important features. Make sure you

address them all !

The first item on the agenda is the number of channels. This will depend on the number of speakers in your system.

Two-channel amplifiers will feed well two speakers or a single subwoofer. You will want to consider a four-channel

amplifier if you have any of the following combinations :

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

Adi said...

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex