28 July 2006

Loss versus Conditional Probability

There was a talk in the session I chaired at ACL about directly optimizing CRFs to produce low F-scores for problems like NE tagging and chunking. The technique is fairly clever and is based on the observation that you can use very similar dynamic programming techniques to do max F-score as to do max log-probability in CRFs.

The details are not particularly important, but during the question phase, Chris Manning asked the following question: Given that F-score is not really motivated (i.e., is a type 4 loss), should we really be trying to optimize it? Conditional probability seems like a completely reasonable thing to want to maximize, given that we don't know how the tags will be used down the pipeline. (It seems Chris is also somewhat biased by a paper he subsequently had at EMNLP talking about sampling in pipelines.)

I think Chris' point is well taken. Absent any other information, conditional probably seems like a quite plausible thing to want to optimize, since given the true conditional probabilities, we can plug in any loss function at test time and do minimum Bayes risk (in theory, at least).

On the other hand, there is an interesting subtlety here. Conditional probability of what? The standard CRF optimization tries to maximize conditional probability of the entire sequence. The "alternative objective" of Kakade, Teh and Roweis optimizes the sub of the conditional probabilities of each label. These are two quite different criterea, and which one should we choose? In fact, neither really seems appropriate. Conditional probability of the sequence doesn't make sense because it would rather improve a bad label from probability 0.01 to 0.1 than improve a bad label from 0.4 to 0.6 and thus get it right. But summed conditional probability of labels doesn't make sense in NE tagging tasks because always assigning probability 0.9 to "not an entity" will do quite well. This is essentially the "accuracy versus f-score" problem, where, when few elements are actually "on," accuracy is a pretty terrible metric.

If we take Chris' advice and desire a conditional probability, it seems what we really want is direct conditional probability over the chunks! But how do we formulate this and how do we optimize it? My impression is that a direct modification of the paper Chris was asking about would actually enable us to do exactly that. So, while the authors of this paper were focusing on optimizing F-score, I think they've also given us a way to optimize conditional chunk probabilities (actually this should be easier than F-score because there are fewer forward/backward dependencies), similar to what Kakade et al. did for conditional label probabilities.

Kevin said...

This is a very thought-provoking post. I was at the talk too but didn't make this connection. It's interesting that the critical question "What do we optimize?" isn't clear all the time in our problems. It'll be really interesting if someone could empirically try the various optimization criteria for chunking/tagging and see how that REALLY affects the later stages in the pipeline. (Of course, then we nead some goodness measure for the final stage too...)

Anonymous said...

Hi Hal,

Thanks for this thoughtful post. It would be great if you mention some interesting papers which you have seen in the conference in a special post. That would be great for those who could not make it to the conference.

Bob Carpenter said...

I totally agree with Chris on this.

We're using the confidence scores as counts in a corpus that we use for data mining and information retrieval of genes by name.

It's easy to convert a forward-backward lattice of tag probabilities to those of chunks. With a BIO-encoding of chunks as tags, check out Culotta and McCallum's Confidence Estimation for Information Extraction, somehow only accepted as a poster.

We used a Begin-Middle-End-Whole encoding of chunkings as taggings in LingPipe, and it makes it a whole lot easier to do extraction. It pulls out n-best chunks (or n-best whole analyses) with conditional probability scores at 330K/second.
We just ran it over all of MEDLINE.

For what it's worth, pulling back most likely sequences vs. most likely tags is not always the same for POS, but the scores are always very close in my experience. We have tutorials on POS with confidence and entity extraction with confidence.

hal said...

Kevin -- I've wanted to do just that for parsing, perhaps with a summarization, EDT and MT system, but the overhead for trying such an experiment is daunting (not to mention the issue of engineering around syntax). Incidentally, Alex Fraser has done just this for alignments.

Bob -- I think I agree with Chris too, to a large degree. I'll have to read the Culotta and McCallum paper...in general I'm not a huge fan of these encodings for sequence segmentation (preferring direct segmentation models), but the paper sounds interesting.

. said...

qishaya said...

one day i went shopping outside,and in an ed hardy store,I found some kinds of ed hardy i love most they are Your website is really good Thank you for the information ed hardy ed hardy ed hardy clothing ed hardy clothing ed hardy shoes ed hardy shoes don ed hardy don ed hardy ed hardy clothes ed hardy clothes ed hardy bags ed hardy bags ed hardy swimwear ed hardy swimwear ed hardy jeans ed hardy jeans ed hardy mens ed hardy mens Thank you for the information

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

xiaoyu said...

Southeast and main Asian pandora jewelry countries have twisted rubies for centuries, cheap pandora bracelets but research as to where, and how to find more deposits is Pandora charms spare, and production has figured out how and mining companies,” Pandora beads Giuliani says, to look at exactly the right time and place.” pandora set Farther investigation of claret formation, based on tectonic scenery, cheap pandora geochemistry, fluid inclusions and isotopic ratios, allowed discount pandora Giuliani’s lineup to remodel a new prototype for the French Institute pandora 2010 of Research for Development (IRD) and the National Scientific pandora sale Center of Research, two government-sponsored knowledge Pandora Bangles and technology research institutes that aim to aid in the sustainable cheap pandora bracelets development of developing countries. Before the collision pandora bracelets prices of the Eurasian and Indian plates, lagoons or deltas sat in the regions where marble is giant, pandora bracelets and charms he says, “and there is the brains to expect that the new pandora bracelets sale thoughts should help development of the artless capital.” discount pandora bracelets Virginie Garnier, Gaston Giuliani and Daniel Pandora necklace Ohnenstetter urban the shape to do just that. They work for the garnet cheap pandora charms genesis. While studying the bedrock in Vietnam in 1998, the discount pandora charms French players found rubies, which detained traces of aluminum, chromium pandora charms sale and vanadium from universities, international corporations, governments pandora charms 2010 and why the rubies got there, and has created a paradigm Pandora beads to help these evaporites, Garnier says, when the Eurasian cheap pandora beads and Indian plates collided, raising the Himalaya Mountains.

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex