I was recently asked if I thought that it would be a good idea if our conferences were to explicitly require an error analysis to be performed and reported in papers. While this is perhaps a bit extreme (more on this later), there are at least two reasons why this would be desirable.
- When multiple techniques exist for solving the same problem, and they get reasonably close scores, is this because they are making the same sort of errors or different sorts?
- If someone were to build on your paper and try to improve it, where should they look?
This situation reminds me a bit of an excellent talk I saw a few years ago (at ACL or EMNLP in Barcelona, I think) by Mitch Marcus talking about some parsing stuff. I don't really remember much of his talk, except that he kept flashing a single slide that read "Look at the data, stupid." His argument was essentially that we're not going to be able to model what we want to model unless we really understand what's going on in the data representing the phenomena we're trying to study.
An exercise that's also good from this perspective is to do some data annotation yourself. This is perhaps even more painful than doing an error analysis, but it really drives home the difficulties in the task.
Getting back to the point at hand, I don't think it's feasible or even necessarily advisable to require all papers to include an error analysis. But I also think that more papers should contain error analyses than actually do (including some of my own). In the universal struggle to fit papers within an 8 page limit, things have to get cut. It seems that the error analysis is the first thing to get cut (in that it gets cut before the paper is even written -- typically by not being performed).
But, at least for me, when I read a paper, I want to know after the fact what I have learned. Occasionally it's a new learning technique. Or occasionally it's some new useful features. Or sometimes it's a new problem. But if you were to take the most popular problems out there that I don't work on (MT, parsing, language modeling, ASR, etc.), I really have no idea what problems are still out there. I can guess (I think names in MT are hard, as is ordering; I think probably attachment and conjunctions in parsing; I have little idea in LM and ASR), but I'm sure that people who work on these problems (and I really mean work: like, you care about getting better systems, not just getting papers) know. So it would be great to see it in papers.