So here's the issue. I have some fixed amount of time in which to annotate data (or some fixed amount of dollars). In this time, I can annotate N data points with a noise-rate of eta_N. Presumably eta_N approaches one half (for a binary task) as N increases. In other words, as N increases, (1-2 eta_N) approaches zero. A standard result in PAC learning states that a lower bound on the number of examples required to achieve 1-epsilon accuracy with probability 1-delta with a noise rate of eta_N when the VC-dimension is h is (h+log(1/delta))/(epsilon (1-2 eta)^2)).

This gives us some insight into the problem. This says that it is worth labeling more data (with higher noise) only if 1/(1-2 eta_N)^2 increases more slowly than N. So if we can label twice as much data and have the noise of this annotation increase by less than a factor of 0.15, then we're doing well. (Well in the sense that we can keep the bound the same an shrink either \epsilon or delta.)

So how does this hold up in practice? Well, it's hard to tell exactly for real problems because running such experiments would be quite time-consuming. So here's a simulation. We have a binary classification problem with 100 features. The weight vector is random; the first 50 dimensions are Nor(0,0.2); the next 35 are Nor(0,5); the final 15 are Nor(m,1) where m is the weight of the current feature id minus 35 (feature correlation). We vary the number of training examples and the error rate. We always generate equal number of positive and negative points. We train a logistic regression model with hyperparameters tuned on 1024 (noisy) dev points and evaluate on 1024 non-noisy test points. We do this ten times for each setting and average the results. Here's a picture of accuracy as a function of data set size and noise rate:

And here's the table of results:

N\eta 0 0.01 0.02 0.05 0.1 0.2

16 0.341 0.340 0.351 0.366 0.380 0.420

32 0.283 0.276 0.295 0.307 0.327 0.363

64 0.215 0.221 0.227 0.247 0.266 0.324

128 0.141 0.148 0.164 0.194 0.223 0.272

256 0.084 0.099 0.100 0.136 0.165 0.214

512 0.038 0.061 0.065 0.087 0.113 0.164

1024 0.023 0.034 0.044 0.059 0.079 0.123

The general trend here seems to be that if you don't have much data (N<=256), then it's almost always better to get more data at a much higher error rate (0.1 or 0.2 versus 0.0). Once you have a reasonable amount of data, then it starts paying to be more noise-free. Eg., 256 examples with 0.05 noise is just about as good as 1024 examples with 0.2 noise. This roughly concurs with the theorem (at least in terms of the trends).

I think the take-home message that's perhaps worth keeping in mind is the following. If we only have a little time/money for annotation, we should probably annotate more data at a higher noise rate. Once we start getting more money, we should simultaneously be more careful and add more data, but not let one dominate the other.

This comment has been removed by the author.

ReplyDeletePossibly relevant paper: Learning from Data of Variable Quality. K. Crammer, M. Kearns, and J. Wortman. NIPS 2005

ReplyDeletebut on the practical side - can you controll the noise level? relaxing the annotation guidlines or telling the annotators that quantity is preferable might create a way too much noise. the problem is even harder for soft classification that is relevant to many nlp experiments.

ReplyDeletefernando, thanks for the ref. it seems a great paper that was just added to my growing to-read list

Nice analysis.

ReplyDeleteIt'd seem the next question here would be how to best use a pool of annotators. Theoretically, you could make assumptions about the correlations of errors and subsequent need for adjudication.

I'm guessing that like every other task of this kind, there's an enormous interpersonal variation in the speed/quality relation and offset that swamps just about every other concern. A hierarchical model might make sense here.

In the end, you also have to make assumptions about the highly non-linear relationship between time and quality. This'd actually have to be measured if you wanted to optimize production from your annotation farm.

It makes a big difference on speed/accuracy if users can just ignore a very hard case. When I was annotating for stemming and call routing, this was a serious concern, as some of the cases were simply inscrutable.

And then there's all the interaction with active learning concerns, which themselves interact with evaluation (0/1 vs. log prob). My guess is that active learning probably picks harder problems in general than random selection.

Next up, there's learn-a-little, tag-a-little, and its effect on both accuracy (it'd seem natural that it'd induce some bias in errors), and time. Of course, there's the stage of training which interacts with all of this.

If you're looking at overall time, I think there are huge improvements to be made at the GUI level. My current NE setup, which we're about to release, lets me chunk bibliographic references into types (e.g. author, title, journal, year) at 1000 tokens/second. I can do newswire into person/location/organization more than twice that fast. You can drive it quickly using only a keyboard, in contrast to standard approaches based on text editors. It took me about a week to build the whole thing, including all the corpus management.

Correction -- that was 1000 tokens/hour for entity tagging by hand in our new interface, not 1000 tokens/second.

ReplyDeletebob:

ReplyDeletethere is a paper that suggests that active learing does indeed select harder examples for the human annotator:

Investigating the Effects of Selective Sampling on the Annotation Task. B. Hachey and B. Alex and M. Becker. Proceedings of CoNLL 2005

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

ReplyDelete艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差、打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店當日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作、兼差打工、假日兼職、兼職工作、酒店兼差、兼差、打工兼差、日領工作、晚上兼差工作、酒店工作、酒店上班、酒店打工、兼職、兼差、兼差工作、酒店上班等,想了解酒店相關工作和特種行業內容,想兼職工作日領、假日兼職、兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

ReplyDelete艾葳酒店經紀是合法的公司工作環境高雅時尚，無業績壓力，無脫秀無喝酒壓力，高層次會員制客源，工作輕鬆，可日領、現領。

一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已，對水水們的上班安全一點保障都沒有！艾葳酒店經紀公司的水水們上班時全程媽咪作陪，不需擔心！只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職、缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??

艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,

等相關服務 幫您快速的實現您的夢想~!!

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

ReplyDeletesesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

ReplyDeleteto a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

kameralı sohbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

seslidunya

seslisehir

sesli sex

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

ReplyDeleteto a few friends of mine that I know would enjoy reading..

seslisohbet

seslichat

sesli sohbet

sesli chat

sesli

sesli site

görünlütü sohbet

görüntülü chat

kameralı sohbet

kameralı chat

sesli sohbet siteleri

sesli chat siteleri

sesli muhabbet siteleri

görüntülü sohbet siteleri

görüntülü chat siteleri

görüntülü muhabbet siteleri

kameralı sohbet siteleri

kameralı chat siteleri

kameralı muhabbet siteleri

canlı sohbet

sesli muhabbet

görüntülü muhabbet

kameralı muhabbet

birsesver

birses

seslidunya

seslisehir

sesli sex