03 February 2007

To err is human, but what about researchers?

Errors happen and sometimes get in to papers. A recent example is the JAIR paper I had with Daniel on Domain Adaptation last year. I actually didn't catch the error myself -- it was caught by someone who was reimplementing the technique. And it's a totally not-insignificant error: essentially, the update equation for the generative parameters is completely botched. If you look through the derivation in the Appendix, it's clear where the error crept in.

Thankfully, this sort of error is essentially a typo. That is, the error was introduced when I was typing up the paper, not when I was doing the research. Why this is important is that it means the the implementation reflects the correct updates: only the paper has the mistake. This means that the experimental results from the paper are valid, contingent on the fact that you rederive the updates yourself, or just ask me what they should be.

I'm writing this post because it's somewhat unclear what to do when such a thing arises. One temptation is to do nothing. I have to admit that I was completely embarrassed when this was pointed out to me. There was a part of me that wanted to ignore it. It seems that this is the wrong approach for a variety of reasons, not the least of which is to make sure that correct information does get out. The question, to some degree, is exactly how to do this. I have a blog, which means I can write an entry like this. I can also put an errata on my web page that points out the errors (I'm writing this up as we "speak"). Given that this is a pub in an online journal, I believe I am able to submit updates, or at least additional appendices, which means that the "official version" can probably be remedied.

But what about conference pubs? If this had appeared in ACL and I didn't have a blog, the situation would be something different (ironically, an earlier version with the correct updates had been rejected from ACL because the derivations were omitted for space and two reviewers couldn't verify them). Also, what if someone hadn't pointed it out to me? I certainly wouldn't have noticed -- that paper was behind me. But then anyone who noticed the errors might dismiss the results on the grounds that they could assume that the implementation was also incorrect (it's not inconceivable that an erroneous implementation can still get good results). This would also not be good because the idea in the paper (any paper with such errors) might actually be interesting.

False things are published all the time. The STOC/FOCS community (i.e., theory community) has a handful of examples...for them, errors are easy to identify because you can prove the opposite of any theorem. I recall hearing of a sequence of several papers that incrementally used results from a previous, but the first was in error, putting the rest in error (I also recall hearing that many of the subsequent results could be salvaged, despite the ancestral mistake).

I don't know if there's a good solution, given our publication mechanisms (essentially, publish-once-then-appear-in-the-anthology). But I'm pretty sure mine is not the first paper with such errors. At least I hope not :).

8 comments:

David Gelbart said...
This comment has been removed by the author.
David Gelbart said...
This comment has been removed by the author.
David Gelbart said...

When I write a conference paper, I create an associated web page, and I say in the paper that the reader can find additional information at that page, or that I may use that page in the future for updates. This way I can make a correction instantly if I need to. The web page is also handy as a place I can use after publication to release source code, give pointers to new related work, and so on.

I put the web page in my research group's web space, rather than my personal web space, so it will still be available after I leave my group and my account is deleted.

The expected lifespan of the web page would be even longer if I could use an institutional-level web space through a service like DSpace. (I'm not sure DSpace has the required capability for post-submission editing, but I think this capability is either available or under discussion in the DSpace community. However, I don't think my university provides a a DSpace service.)

Suresh Venkatasubramanian said...

I am dealing with the same problem right now. the prescribed approach is:

1. send a mail to the PC chair for the conference notifying them of the error, fixes if any, related docs, and the text of an erratum to be added to the next proceedings.

2. do all the website stuff that David mentioned.

3. If the paper was arXiv'd, then the comments on the updated version can incorporate this as well

David Gelbart said...

I wonder if there is a way to have a paper annotated with a correction in the IEEE Xplore system?

Several years ago, I published an incorrect IEEE workshop paper, and at that time I didn't follow my current practice of linking to a web page in each paper. So I put a new PDF copy of the paper on my group's web site with a footer added on the bottom of the first page containing a URL for the correction. (I think this was better than simply mentioning the correction on my home page, since search engines will sometimes bring people directly to the PDF.) Still, people are likely to find the IEEE Xplore PDF instead. Perhaps the IEEE would be willing to stick the new PDF into the Xplore database for me?

David Gelbart said...

I checked with the IEEE and it's against their policy to replace the PDF. The policy is understandable. I hope at some point in the future the IEEE will start offering archiving of supplementary online material related to IEEE papers (such as source code or expanded tables of results), as Science and Nature do, and perhaps some kind of mechanism for noting corrections could be made part of that, but I don't expect that would apply
to older papers.

I published a later paper which cites my IEEE paper and mentions the corrections, but it is not an IEEE paper (it's published by ISCA) and doesn't show up in citation tracking. I have been volunteering with ISCA to help an effort to have ISCA archives indexed by Google Scholar. If that happens, anybody who finds IEEE paper through Google Scholar will be able to find the ISCA paper through the Google Scholar citation tracking, and that will put my mind greatly at ease.

David Gelbart said...

There's a discussion about this issue on reddit today:

http://programming.reddit.com/info/1d1d1/comments

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花