Comments on natural language processing blog: Quality vs. quantity in data annotation

bob:there is a paper that suggests that active lea...

2007-05-31T02:37:00.000-06:00

bob:
there is a paper that suggests that active learing does indeed select harder examples for the human annotator:
Investigating the Effects of Selective Sampling on the Annotation Task. B. Hachey and B. Alex and M. Becker. Proceedings of CoNLL 2005

Correction -- that was 1000 tokens/hour for entity...

2007-05-30T13:03:00.000-06:00

Correction -- that was 1000 tokens/hour for entity tagging by hand in our new interface, not 1000 tokens/second.

Nice analysis. It'd seem the next question here w...

2007-05-30T11:14:00.000-06:00

Nice analysis.

It'd seem the next question here would be how to best use a pool of annotators. Theoretically, you could make assumptions about the correlations of errors and subsequent need for adjudication.

I'm guessing that like every other task of this kind, there's an enormous interpersonal variation in the speed/quality relation and offset that swamps just about every other concern. A hierarchical model might make sense here.

In the end, you also have to make assumptions about the highly non-linear relationship between time and quality. This'd actually have to be measured if you wanted to optimize production from your annotation farm.

It makes a big difference on speed/accuracy if users can just ignore a very hard case. When I was annotating for stemming and call routing, this was a serious concern, as some of the cases were simply inscrutable.

And then there's all the interaction with active learning concerns, which themselves interact with evaluation (0/1 vs. log prob). My guess is that active learning probably picks harder problems in general than random selection.

Next up, there's learn-a-little, tag-a-little, and its effect on both accuracy (it'd seem natural that it'd induce some bias in errors), and time. Of course, there's the stage of training which interacts with all of this.

If you're looking at overall time, I think there are huge improvements to be made at the GUI level. My current NE setup, which we're about to release, lets me chunk bibliographic references into types (e.g. author, title, journal, year) at 1000 tokens/second. I can do newswire into person/location/organization more than twice that fast. You can drive it quickly using only a keyboard, in contrast to standard approaches based on text editors. It took me about a week to build the whole thing, including all the corpus management.

but on the practical side - can you controll the n...

2007-05-30T07:06:00.000-06:00

but on the practical side - can you controll the noise level? relaxing the annotation guidlines or telling the annotators that quantity is preferable might create a way too much noise. the problem is even harder for soft classification that is relevant to many nlp experiments.

fernando, thanks for the ref. it seems a great paper that was just added to my growing to-read list

Possibly relevant paper: Learning from Data of Var...

2007-05-29T15:41:00.000-06:00

Possibly relevant paper: Learning from Data of Variable Quality. K. Crammer, M. Kearns, and J. Wortman. NIPS 2005

2007-05-29T15:38:00.000-06:00

This comment has been removed by the author.