25 August 2006

Doing Named Entity Recognition? Don't optimize for F1

(Guest post by Chris Manning. Thanks Chris!)

Among ML-oriented nlpers, using a simple F1 of precision and recall is the standard way to evaluate Named Entity Recognition. Using F1 seems familiar and comfortable, but I think most nlpers haven't actually thought through the rather different character that the F1 measure takes on when applied to evaluating sequence models. It's not just that it's a type 4 loss (a simple, intuition-driven measure like accuracy): In most cases such measures are reasonable enough for what they are, but using F1 for NER has an under-appreciated dysfunctional character. You wouldn't want to optimize for it!

This post explains what I was really thinking about when I made the comment that Hal referred to previously (fortunately, I didn't try to say all this after the talk!). I agree with Hal: the paper was a very nice piece of work technically. I just think that the authors, Jun Suzuki et al., chose a bad peak to climb.

Everyone is familiar with the F1 measure for simple classification decisions. You draw a 2x2 contingency table of whether something should be yes/no, and whether the system guessed yes/no, and then calculate the harmonic mean of precision and recall. But now think about Named Entity Recognition. You're chugging through text, and every now-and-again there is an entity, which your system recognizes or doesn't or fantasizes. I will use the notation word/GOLD/GUESS throughout, with O denoting the special background class of not-an-entity. So there are stretches of plain text (drove/O/O along/O/O a/O/O narrow/O/O road/O/O). These are the non-coding regions of NER. Then there are sequences (of one or more tokens) where there was an entity and the system guessed it right (in/O/O Palo/LOC/LOCAlto/LOC/LOC ./O/O), where there was an entity but the system missed it (in/O/O Palo/LOC/O Alto/LOC/O ./O/O), and where there wasn't an entity but the system hypothesized one (an/O/O Awful/O/ORG Headache/O/ORG ./O/O).

Things look good up until here: those events map naturally on to the false negatives (fn), true positives (tp), false negatives (fp), and false positives (fp) of the simple classification case. The problem is that there are other events that can happen. A system can notice that there is an entity but give it the wrong label (I/O/O live/O/O in/O/O Palo/LOC/ORG Alto/LOC/ORG ./O/O). A system can notice that there is
an entity but get its boundaries wrong (Unless/O/PERS Karl/PERS/PERS Smith/PERS/PERS resigns/O/O). Or it can make both mistakes at once (Unless/O/ORG Karl/PERS/ORG Smith/PERS/ORG resigns/O/O). I'll call these events a labeling error (le), a boundary error (be), and a label-boundary error (lbe).

I started thinking along these lines just as an intuitive, natural way to characterize happenings in NER output, where entities are sparse occurrences in stretches of background text. But you can make it formal (I wrote a Perl script!). Moving along the sequence, the subsequence boundaries are: (i) at start and end of document, (ii) anywhere there is a change to or from a word/O/O token from or to a token where either guess or gold is not O, and (iii) anywhere that both systems change their class assignment simultaneously, regardless of whether they agree. If you chop into subsequences like that, each can be assigned to one of the above seven classes.

Now, the thing to notice is that for the first 4 event types, you are either correct or you get 1 demerit, assessed to either precision or recall. In the simple classification case, that's the end of the story and the F1 measure is sensible. But when doing precision and recall over subsequences, there are these other three event types. Each of them is assessed a minimum of 2 demerits, with both precision and recall being hit. Therefore, it is fairly clear that optimizing for F1 in this context will encourage a system to do the following: if I'm moderately uncertain of either the class label or the boundaries of the entity, because a mistake would cost me a minimum of 2 demerits, I'm better off proposing no entity, which will cost me only 1 demerit.

(Two notes:

(i) As I've defined events, the possible demerits for an event in the last three classes is unbounded, though in practice 2 is the most common case. For example, this lbe event would be assessed 4 demerits (3 to precision, and 1 to recall): Smith/ORG/PERS and/ORG/O Newcomb/ORG/PERS and/ORG/O Co./ORG/ORG.

(ii) Despite my title, the problem here isn't with the F measure per se, as Bob Moore emphasized to me at a coffee break during ACL 2006 (thanks!). The problem would occur with any measure that combines precision and recall and which is increasing in both arguments, such as the simple arithmetic mean of precision and recall.)

Observe that this behavior is the opposite of the way things were meant to work: people adopted F1 in IR rather than using accuracy because accuracy gives high scores to a system that returns no documents, which obviously isn't useful. But, here, optimizing for F1 is encouraging a system to not mark entities.

Now let's look at some data. I used this event classification system on the output of my NER system on the CoNLL 2003 shared task English testa data. Here is how many events of each type there were:

tn 5583
tp 4792
fn 118
fp 120
le 472
be 102
lbe 75

Note in particular that over 2/3 of the errors are in those 3 extra categories that are multiply penalized. The ratios of classes vary with the task. For example, in biological NER, you tend to get many more boundary errors. But in my experience it is always the case that lots of the errors are in the last 3 classes.

Moreover, some of the errors in the le and be classes are not that bad, and sometimes even reflect subtle judgement calls and human annotator inconsistency in the gold standard. For instance, in the GENIA data you can find both regulation/O of/O human/DNA interleukin-2/DNA gene/DNA expression and transduction/O to/O the/O human/O IL-2/DNA gene/DNA, where it is unclear whether to include human in the name of the gene. Or in a newswire phrase like the Leeds stadium, it's not always very clear whether Leeds should be tagged ORG as a reference to the football team or LOC as a reference to the city. In almost any imaginable task, you would prefer systems that made these errors to ones that missed such entities entirely. In other words, the F1 measure is punishing more severely mistakes that should be punished less according to reasonable intuitions of task utility.

Has this been noticed before? I think so. The ACE program has a system for giving partial credit. But most ML people react very negatively to a scoring system that you couldn't possibly write on a napkin and which involves various very arbitrary-looking constants.... Do these observations undermine the last decade of work in NER? I don't think so. It turns out that there are lots of measures that are pretty okay providing you do not specifically optimize for them, but are dysfunctional if you do. A well-known example is traditional readability measures.

p.s. As I finish writing this guest post, it's occurred to me that I think this is the first nlpers post with some actual natural language examples in it. If you're reading this post, I guess that at least shows that such content isn't actively filtered out!

hal said...

Very interesting -- I'm glad you put so much effort into thinking about this so the rest of us don't have to :).

I'd like first to comment a bit about the ACE measure (restricted the NE tagging). Here, essentially what you do is create a bipartite graph. One half is the "true mentions" and the other half is the "system detected mentions." We then do a matching between them (ignore how for a second) and compute a final score based on this matching. For instance, you get docked some points if types don't match across links. Matches that don't overlap by some character-based minimum amount (eg., 90% of the characters) are disallowed. You also get penalized for misses (unmatched elements on the truth side) or false alarms (unmatched elements on the system side). The way the matching is actually computed is so as to MAXIMIZE your end score: this is a straightforward bipartite matching problem and can be solved efficiently.

The major thing that I think the ACE score misses that Chris talks about is the issue of span. According to ACE, either spans overlap or they don't. 90% (IIRC) of characters in common is a match, 89.9% is not. This, for instance, would not allow for many of Chris' examples. Moreover, it doesn't fix the problem that only getting "human" in "human/DNA interleukin-2/DNA gene/DNA" is probably much worse than only getting "gene." But to know that, eg., "gene" is the head, or than last names are more important (generally) than first, we would need to actually annotate this, or come up with heuristics (likely akin to Mike Collin's head finding heuristics for head-finding in NPs).

A second issue is believability, which Chris also mentioned in terms of "ML people react very negatively to a scoring system that you couldn't possibly write on a napkin and which involves various very arbitrary-looking constants." I think there are two reasons for this.

(1) Complex loss functions with weird constants are just not believable. As I talked about before, we really want loss function than generalize, not that just fit the data. If we fit the data using a lot of features and weights, but this is not intuitive (our "prior" says "crazy!"), we're not going to believe it will generalize. I don't think the ACE metric is all that bad in this respect (I've seen far worse). But I have had lengthy conversations with hardcore ML people who think that even something like BLEU is too complex.

(2) Practically speaking, complex loss functions are hard to optimize. Hamming loss is just so easy in comparison to really anything else (even F1 is hard, as is noted by the technical difficulty in the paper that brought up this conversation). Directly optimizing ACE or BLEU is quite difficult and the sorts of techniques that we (as machine learning folk) use are not really up to snuff here.

One question -- and I don't really think this has an answer (but I'll ask it anyway) -- this brings up is: how can we tell if a loss function is good? Of course, if we can compare it to real evaluations, this is great, but what if we can't (eg., in parsing or NE tagging). Is F1 good for parsing for instance? Is the ACE metric good, or at least better than F (in some formal sense)? Is there a way to tell whether a metric is easy to game?

Okay, a long post gets a long reply, but I'll stop here :). Thanks again to Chris for putting this together.

Anonymous said...

The "problem" Chris raises can be attributed to focusing on first-best solutions. I like to think of the problem of entity extraction as more like search and less like first-best classification (though there is clearly a deep connection between search and classification). Downstream applications typically focus on gathering/tracking information about some known entities (e.g. protein p53 and things it regulates in humans), or on mining data to discover relationships or patterns (e.g. "species=human, regulator:p53, regulated: human insuline-like growth factor II").

Suppose you have a system that can return multiple mentions and spans from a text input and give them scores that are comparable across docs. We like to use conditional probabilities p(mention of type T from position n to n+k|text) here, because they're easy to combine with downstream processing like social network analysis and because they have the desirable cross-document comparability. For instance, here's some output drawn from LingPipe's Named Entity Tutorial (trained from NCBI's GeneTag corpus):

p53 regulates human insulin-like growth factor II gene expression through active P4 promoter in rhabdomyosarcoma cells. Mentions by confidence: "p53":0.9999, "p4 promoter":0.7328, "insulin-like growth factor II gene":0.6055, "human insulin-like growth factor II gene":0.3817, "active P4 promoter":0.1395, "P4":0.0916, "active P4":0.0088, "insulin-like growth factor II":0.0070, "human insulin-like growth factor II":0.0044, ... The numbers are conditional probability estimates of the phrase being a gene given the input text (this is actually done by span, not by phrase, but the above is easier to read with the limited formatting available in this blog). You can see that it's just not sure if "human" should be in there, just as in Chris's example.

A ranking of entities along with a reference gold standard allows you to draw a precision-recall curve, compute mean-average precision (MAP), precision-recall breakeven point (BEP), compute precision-at-n-documents, and so on. For instance, for genes, we can do 99.5% recall at 10% precision for search, or tighten precision down to 99% for mining applications.

Perhaps even more importantly, we can combine information across mentions. This was originally done statistically by Mark Craven, I believe. Most simply, we can combine scores from all mentions of "P4 promoter" and estimate its total count in a corpus. This allows very high precision extraction if we take high estimated count events. We can also use this kind of output as the basis of rescoring by downstream models such as relation extractors (as doen by Dan Roth) and coreference resolvers (Heng Ji and Ralph Grishman), though that's typically done with whole-sentence n-best (which LingPipe also does) rather than per-entity n-best.

Historical note: The whole alignment/partial-credit thing goes back to MUC. It's possible to remove the alignment part of it and just give partial credit for overlaps with the same type or exact span matches with different types and to break all that down by type and what not. But it's still not very easy to understand the final numbers. As Chris's figures show (and they're typical), most errors get partial credit, so this scoring often comes close to halving error rates. DARPA's performance figure reverse engineering is interesting from both a technical and sociological/organizational/marketing perspective.

hal said...

While I agree that it is often important to not just provide a single-best output but either an n-best list or a distribution over outputs, I feel that this is something of a side issue. (Of course, if we're trying to produce probabilities, then the whole "optimizing F1" is irrelevant since we'll instead be optimizing for conditional probability.) It would of course be possible to extract n-best lists from a system trained using Suzuki et al.'s technique, but if the loss being optimized is fundamentally wrong (as Chris contests) then doing so won't help us.

In the end (*), what we want is a loss function for NE rec such that low loss implies good inputs for systems further down the pipeline (since there are only very few imaginable scenarios in which NER is the final stage). It may turn out, as Chris contends, that conditional probability is a pretty darn good option. But whether we use an n-best list or not seems tangential to me.

(*) Of course, it may be possible to directly optimize NER performance on the basis of final task performance, without inventing a new loss function. In fact, I hope this is true :).

Anonymous said...

Think about this when deciding whether first-best is enough: when's the last time you've pressed Google's "I'm Felling Lucky" button? A system that's limited to a first-best state-of-the-art 80-85% recall is going to miss a lot of entities. I believe a relevant task eval is the precision you can deliver at 99.5% recall. Or what recall you can get at 95% precision.

Optimizing for first-best can actually hurt performance at different precision/recall points. Our own long-distance models are more attenuated than the more local ones; they're about 10% absolute better on F measure, but much worse on precision-at-100 docs, area-under-ROC curve, or MAP-type measures.

Assuming that partial matches are desirable, as Chris did, assumes that downstream processes have the means to correct noisy first-best output. I think it's easier to combine and rescore than it is to correct, but that's just a hunch.

Scoring by cross-entropy estimates of the gold standard versus a system model makes a lot of sense. The score is just the sum of the estimated log probs for the reference annotations. It works not only for NE, but for just about any task you can imagine. Not only for evaluation, but for things like EM training. It also makes sense with either joint or conditional estimates.

The advantage of scoring by cross-entropy is that developers don't need to write decoders -- just scorers. The problem for a bakeoff scoring by cross-entropy is that it assumes systems are properly normalized probabilistic systems, which rules out SVMs, TF/IDF, heuristic pattern filters (unless they can be integrated with rescoring), etc.

Scoring ranked n-best entities a la TREC, as I suggested, eliminates normalization errors (and cheating possibilities). But it requires ranked output, and for best performance, n-best. I believe downstream systems require n-best anyway, so this doesn't seem like much of an imposition.

hal said...

I more-or-less agree that first best is not always the best way to evaluate a system, especially one that is going to be embedded within a larger system. There are many options for how to embed, the simplest of which are probably n-best lists or samples from the posterior (see Chris' paper).

All I'm saying is that I think that the issue of F1 being suboptimal for optimization is independent of the one-best versus many distinction. If the loss you're optimizing is fundamentally broken, then you're essentially leaving it up to chance that it's somehow "close enough" that within an n-best list, you'll get something reasonable. But this is a strong assumption. I think that if you believe what Chris is saying, then you must believe it whether you are producing single best outputs or n-best outputs.

Abhishek said...

Given that I have recognized the boundaries of named entities in a large text corpora, are you aware of any unsupervised technique that classifies these Entity candidates?

hal said...

Abhishek: I'd take a look at Collins and Singer's bootstrapping approach for that. (Plus things that cite it.)

Anonymous said...

I'm a little slow off the bat, but:

I am interested in some clarifications of Chris's error classificatiion. The described segmentation method would leave some cases accounted for very strangely (though they may be rare in English NER).

For example:

Chrisopher|I-PER|I-PER Manning|I-PER|B-PER will be marked as a single BE, presumably. Will Chrisopher|I-PER|I-PER Manning|I-PER|I-ORG be an LBE?

With this segmentation, if a system tags an entire sentence with one tag, or with many tags which continually overlap with gold-standard entities but whose boundaries never coincide, it will be counted as a single segment, and judged as a maximum of 1 error. Surely this is just as useless a result.

I am basically not convinced that we have a clear way of counting boundary errors.

Instead, perhaps, we could give a score for each correct and predicted annotation, in a way that simple cases (exact matches, boundary errors, etc.) will award two agreeing points:

Unless/O/B-PERS Karl/B-PERS/B-PERS Smith/B-PERS/B-PERS resigns/O/O

Here correct Karl Smith corresponds to predicted Unless Karl Smith. From the perspective of either annotation, this is a boundary error.

Chrisopher|I-PER|I-PER Manning|I-PER|B-PER

This might be scored as two boundary errors from the perspective of the predictions, and one (label-)boundary error from the perspective of the correct entity.

But this scoring is biased against false positives and false negatives, and probably has a number of other faults.

Anonymous said...

When computing precision, recall, and f-measure, do you exclude non-coding regions of the text? (eg, "drove/O/O along/O/O a/O/O"?)

Seems if you include non-coding regions, a set of evaluation documents with very few entities will produce artificially high results. However technically, a classification result of "not an entity" is still a classifier result (capable of false positive, negative, true positive, etc). Any thoughts on this? Should non-coding tokens be included or not included?

Anonymous said...

Youth is warcraft leveling not a time of life;warcraft leveling it is a wow lvl state of mind; wow power level it is not power leveling amatter of World of warcraft Power Leveling rosy cheeks, red wrath of the lich king power leveling lips and supple knees;WOTLK Power Leveling it is a matter of thewill,wlk Power Leveling a quality of buy aoc gold the imagination,aoc gold a vigor of the emotions; it is thefreshness of the deep springs wow gold of life. Youth means a tempera-mental maplestory mesos predominance of courage over timidity, of the appetite formaple story mesos adventure over the love of ease. wow gold This often existsin a man of 60 more than a boy of 20. Nobody grows old merely by anumber of years.

Anonymous said...

Ultima Online Gold, UO Gold, crestingwait
lotro gold
wow gold
warhammer gold
Age of Conan Gold, AOC Gold

Anonymous said...

Do you know Rose zuly? I like it.
My brother often go to the internet bar to buy rose zulie and play it.
After school, He likes playing games using these rose online zuly with his friend.
I do not like to play it. Because I think that it not only costs much money but also spend much time. One day, he give me many rose online zulieand play the game with me.
I came to the bar following him and found Arua ROSE zulywas so cheap. After that, I also go to play game with him.

Do you know Scions Of Fate gold? I like it.
My brother often go to the internet bar to buy SOF gold and play it.
After school, He likes playing games using these Scions Of Fate money with his friend.
But I do not like to play it. Because I think that it not only costs much money but also spend much time. One day, he give me many cheap SOF goldand play the game with me.
I came to the bar following him and found buy sof goldwas so cheap. After that, I also go to play game with him.

Anonymous said...
Anonymous said...

Anonymous said...

After being wow gold informed of wow power leveling the problem, wow power leveling their daughter's date dog apparel said he could get the peanut out.wow power leveling With that, Wow Power Level the pilot threw open dog clothing the door and jumped from the plane.flyff power leveling the young man's Atlantica power leveling sunburn started power leveling acting up again.dog clothes He asked to be excused,dog clothes wholesale went into the kitchen power leveling The executoner said that if pet clothing this happens a second archlord power leveling time throws out a grenade and says, "i'm in the army, world of warcraft gold i can get these whenever i need them."dog clothes so they all land pet clothes safely

Anonymous said...

Anonymous said...

I am grateful to you for this great content.aöf thanks radyo dinle cool hikaye very nice sskonlycinsellik very nice ehliyet turhoq home free kadın last go korku jomax med olsaoy hikaye lesto go müzik dinle free only film izle love aşk 09sas mp3 indir

Anonymous said...
Anonymous said...
Anonymous said...

to a scoring system that you couldn't possibly write on a napkin and which involves various very arbitrary-looking constants." I think there are two reasons for this.
Dissertation Writing | Essay Writing | Research Paper Writing

Anonymous said...

Great post, i really appreciate it.
Term Paper Writing | Thesis Writing

term paper guide said...

Great blog, people get lots of information keep on posting this type of attractive articles.

custom paper guide

college term paper

Research Papers said...

I'll be glad to hear any other news from you.
Essay Writing | Buy Research Paper | College Essays

Writing a Research Paper said...

Many institutions limit access to their online information. Making this information available will be an asset to all.

Anonymous said...

Student can get term papers and custom term papers help online through many websites.

Anonymous said...

Wonderful blog, i recently come to your blog through Google excellent knowledge keep on posting you guys.

Dissertation services
Dissertation writing

Joe said...

Thank you for great article. Where else could anyone get that kind of information in such a perfect way of presentation.
Oes Tsetnoc

Joe said...

this is great information that i know a lot of people are interested in.
Kerja Keras Adalah Energi Kita | Hosting Murah | Kerja Keras Adalah Energi Kita

Methey said...

I like the way you explain things.

Babies | Bayi

deLi said...
Baby said...

Custom Essay said...

Hi,
I personally like your post; you have shared good insights and experiences. This post will really help beginners, although it is basic but, it will help others in great deal in future. Keep it up.

College Research Paper said...

I appreciate the work of all people who share information with others.

Anonymous said...

Hi,
I personally like your post; you have shared good information.

Unknown said...

Hi,
Thank you for sharing information in the blog. You are really doing a good work. I personally like this blog and appreciates your efforts.

Assignment Writing

Unknown said...

Hi,
It must've taken you a bit of time, so thanks for taking the time to do so, I appreciate it, and this post is just great.
Coursework help

Essay Paper said...

Hi,

Anonymous said...

Thank you very much for a kind of the hottest data just about this post ! You have to ground your dissertation, I think. Because some thesis writing services do such things and you are able accomplish really good format thesis too.

PDF BOOKS said...

Thanks for writing this informative article.

pdf books
free pdf books
online pdf books

Hey,

Custom Term Papers said...

Nice post! Very complete and detail information. That’s what I need! Well done!
Custom Term Paper

Dissertation said...

Great, I know I've wanted a larger test ACE a few times, glad to see someone created one!

Unknown said...

Are you a fashion chaser? And do you like polo shirts, which are very chic, especially the polo t shirts, I love them very much. I also like playing tennis rackets, it can keep healthy, what do you like to do?
We are so honour to say that we are the outlet of
polo t shirts women
polo t shirts on sale
polo t shirts for women
polo shirts on sale
these products are warmly welcomed by our customers. And what's more we also sell
polo shirts men
men's polo shirt
men polo shirt
mens polo shirts
mens polo shirt
and the high-quality
cheap polo shirts
discount polo shirts
men's polo shirts
women's polo shirts
Our store are also famous for
cheap tennis racket
discount tennis racket
and the main product is
prince tennis racquet
wilson tennis racket
babolat tennis racquet
In this case, if you have time, you can buy many excellent things in our store, and we can assure you that we will not let you down.

Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Unknown said...

that is a great kind of information peoples easily atract with it
Thesis Writing | Dissertation Writing | Essay Writing | Assignment Writing

Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Anonymous said...

I am interested in is that Chris mistake classificatiion some clarification. Segmentation method will described in some cases, very strange, but they may be in English net enrollment rate of the rare.

Thesis Help | Dissertation Help | Essay Help | Assignment Help

Custom Thesis Writing said...

this kind of blog always useful for blog readers, it helps people during research. your post is one of the same for blog readers.

Thesis Papers Writing

Amber Melton said...

Good post! Very comprehensive and detailed information. This is what I need! Well done!
Social Media Marketing Press Release Submission Social Media Tips Press Release Submission

Many institutions limit access to their online information. Making this information available will be an asset to all.

Anonymous said...
college essays said...

Great idea..thanks for sharing..I will forward it to my friends...
Term Papers | Custom Essays | Research Papers

Anonymous said...

Great Article as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which we all need, thanks for all the enthusiasm to offer such helpful information here
link building service | social bookmark submission| Article Submission Service | Manually directory Submission Service | press release distribution

hayda said...

What a helpful post really will be coming back to this time and time again. mirc . chat . chat sohbet . mirc sohbet . cinsellik . cinsel sohbet . cinsellik sohbet . ask sohbet .Thanks ..

daoes said...

These kind of post are always inspiring and I prefer to read quality content Car Insurance
so Zynga Poker Bot | Top health plans happy to find many good point here in the post, writing is simply great, thank you for the post

Logo designs said...

Interesting post, Keep up the good work:
Logo Designs , Logo Design

www.4xindia.com said...

A lot of great info and ideas in this post, thanks for bringing this to us!
Jenny Parker

Thesis Writers said...

this kind of blog always useful for blog readers, it helps people during research. your post is one of the same for blog readers.

Thesis paper Writers

logo design - logoinn said...

What a helpful post really will be coming back to this time and time again.
Logo Designs | Logo Design

Thesis Help said...

Great thesis help for the students for thesis paper.

Essay Thesis said...

Helpful post. Very informative. thesis statement

aai333 said...
Unknown said...
Unknown said...
Unknown said...
ai said...

polo boots
It's all about fierce glamour with high octane gloss and lashings of sparkle as fabrics go metallic with shimmering luxe finishes. Forpolo shoes
, gloriously excessive embellishment is absolutely key, championed at cheap herve leger outlet
and Elie Saab. Just remember one simple rule: Too much is not enough
Lightening bolts of acid brights emphasised by herve leger outlet
insatiable mood for dark tones, discount herve leger 2010shake up the catwalks for an unexpected twist to the season. Flashes of fuchsia, and minimalist cobalt come in the form of newest herve leger and statement dresses for a bold, dynamic fashion direction.

Unknown said...

Designer purses and handbags cheap LV bagsTommy and Kate handbagsdiscount Gucci bags you wear it to work with friends newest Dooney&Bourke handbags
Kate is a designer labelChanel handbags
outlet
Tommy and Kate handbags Burberry handbags 2010
Shox Technology Nike Nike air jordan
2010
The effect is more traction basketball air jordan shoesLunarLite cheap Jordan shoes outlet Zoom Airjordan
Depending on the application discount
jordan shoes
maintaining extreme responsivenessjordan 2010Flywire
This is one of the newer technologies introduced by Nike. jumpman23to ensure that the foot is kept in place.
Timberland shoes or boots
cheap timberlandSince the Timberland
branddiscount timberlands
These boots are specially timberland
Boots
How we tall them apart?cheap
timberland uk
But on the other hand there are usually quite few real Timberland
discount timberland outlet lines
being sold for a fraction of the price. Sunglasses discounttimberland outlet are not limited
for the summer season alone. This is basically because sunglasses newest Gucci sunglasses
Here are five simple steps discount coach
sunglasses

In Marketing, discount Oakley sunglasses
Now because of customized sunglassRay Ban
sunglasses outlet
racks, unreclaimed creature, without refinement,
Nike shoesair max nike
in 1987 first time. Since then Nike newest air
max
has been frequently introducing new as well as updated models in it.cheap air max shoesJust because air max Clearly, consumers are just as happy as athletes.air max nike Also on thisJordan ajf shoes pair they Moncler,orMoncler jackets,maybe you like Moncler coats,discount Moncler Vest,you can choose someMoncler outletandmoncler polo t-shirt

Unknown said...

Designer purses and handbags cheap LV bagsTommy and Kate handbagsdiscount Gucci bags you wear it to work with friends newest Dooney&Bourke handbags
Kate is a designer labelChanel handbags
outlet
Tommy and Kate handbags Burberry handbags 2010
Shox Technology Nike Nike air jordan
2010
The effect is more traction basketball air jordan shoesLunarLite cheap Jordan shoes outlet Zoom Airjordan
Depending on the application discount
jordan shoes
maintaining extreme responsivenessjordan 2010Flywire
This is one of the newer technologies introduced by Nike. jumpman23to ensure that the foot is kept in place.
Timberland shoes or boots
cheap timberlandSince the Timberland
branddiscount timberlands
These boots are specially timberland
Boots
How we tall them apart?cheap
timberland uk
But on the other hand there are usually quite few real Timberland
discount timberland outlet lines
being sold for a fraction of the price. Sunglasses discounttimberland outlet are not limited
for the summer season alone. This is basically because sunglasses newest Gucci sunglasses
Here are five simple steps discount coach
sunglasses

In Marketing, discount Oakley sunglasses
Now because of customized sunglassRay Ban
sunglasses outlet
racks, unreclaimed creature, without refinement,
Nike shoesair max nike
in 1987 first time. Since then Nike newest air
max
has been frequently introducing new as well as updated models in it.cheap air max shoesJust because air max Clearly, consumers are just as happy as athletes.air max nike Also on thisJordan ajf shoes pair they Moncler,orMoncler jackets,maybe you like Moncler coats,discount Moncler Vest,you can choose someMoncler outletandmoncler polo t-shirt

tattoos designs said...

Nice post, thanks for sharing this wonderful and useful information with us.

Unknown said...

No matter what product you purchase from us north face jacketsWe are offering you a wide range ofnorth face outletquestion，Some color combinations seem to never get old north face outerwearBut within the same community north face coatsbecause it features just the right amount of north face uk，look at another good paoduct such as Dior totes，A little of these are given below.ugg bootswas a very well-known French fashionable boot
cheap ugg boots，because of the wisdom of brilliant featuresdiscount ugg boots，which you are buying is unique and original classic ugg boots，When possible, they obtain materials from domestic suppliers ugg classic tall boots，which makes it exclusive and uniquebabyliss，I like the other two designs as wel Benefit GHD，A stroll around the park with the GHD IV Salon Styler，They're also used to buy GHD Mini Styler，A stroll around the park with GHD Precious gift，They're also used to help prevent
GHD Rare Styler，Following the success of the initial gray ghd，The extremity of the sole is gold ghd
And those who buy this ghd Instyler，people who work outside and so wearisome Kiss ghd，I would recommend moving up to pink ghd，If your own walks or intentions for this pure black ghd，I wouldn’t hesitate to recommend the pure white ghd，our price is very reasonable purple ghd

aai333 said...
Unknown said...
Unknown said...

Fashion trends change on daily basis, like Gold GHD. Following the latest in designer shades has become a passion of everyone, now Burberry Sunglasses. If you are the type of a woman who loves to explore in fashion, our ED Hardy Sunglasses will definitely satisfy your taste. Cheap Ed Hardy Sunglasses is also OK. Ed hardy streak of clothing is expanded into its wholesale ED Hardy T-shirt chain so that a large number of fans and users can enjoy the cheap ED Hardy Clothing range easily with the help of numerous secured websites, actually, our ED Hardy Outlet. As we all know, in fact discount ED Hardy, is based on the creations of the world renowned tattoo artist Don Ed Hardy. Well, this question is bound to strike the minds of all individuals. Many people may say Prada shoes is a joke, but we can give you Prada Sunglasses, because we have Prada handbags. Almost everyone will agree that Prada Purses are some of the most beautiful designer handbags marketed today. Now we have one new product: Prada totes. The reason is simple: fashion prohibited by ugg boots, in other words, we can say it as Cheap ugg boots. Would you like to wear Discount ugg boots. We have two kinds of fashionable boots: classic ugg boots and ugg classic tall boots. Ankh Royalty--the Cultural Revolution. Straightens out the collar, the epaulette epaulet, the Ankh Royalty Clothing two-row buckle. Now welcome to our Ankh Royalty Outlet. And these are different products that bear the most famous names in the world of fashion, like Ankh Royalty T shirt by the way -Prada, Spyder, Moncler(Moncler jackets，or you can say Moncler coats, Moncler T-shirt, Moncler vest，and you can buy them from our discount Moncler outlet), GHD, ED Hardy, Ankh Royalty, Twisted Heart.

Unknown said...
Dissertation Help said...

it's good to see this information in your post, i was looking the same but there was not any proper resource, thanx now i have the link which i was looking for my research.

UK Dissertations Help

aweseome blogger said...

Everyone is familiar with the F1 measure for simple classification decisions.
cara meninggikan badanYou draw a 2x2 contingency table of whether something should be yes/no, and whether the system guessed yes/no, and then calculate the harmonic mean of precision and recall.

Unknown said...
Unknown said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

aweseome blogger said...

Among ML-oriented nlpers, using a simple F1 of precision and recall is the standard way to evaluate Named Entity Recognition. Using F1 seems familiar and comfortable, but I think most nlpers haven't actually thought through the rather different character that the F1 measure takes on when applied to evaluating sequence models. tinggi badan

Dissertation Help said...

Wonderful post, thanks for putting this together! "This is obviously one great post. Thanks for the valuable information and insights you have so provided here. Keep it up!"
Dissertation Help | Custom Dissertation

speed dating nyc said...

daniel john said...

Nice information, many thanks to the author. It is incomprehensible to me now, but in general, the usefulness and significance is overwhelming. Thanks again and good luck!

Term papers

Unknown said...

NewStreetFashion
Ed Hardy
stylish design
Ed Hardy Wholesale
fashion excellent quality
wholesale Ed Hardy
ED Hardy clothing bring you a super surprise!
ed hardy wholesale clothing
The quality is so good
christian audigier
Young and creative style
abercrombie and fitch
You can have a look at it.
abercrombie & fitch
jordan 8
jordan 9
jordan 10

Viko said...

What a great style. Very informative one, I hope you will continue your research.
I can offer you term paper
on this subject. Thank you.

daniel john said...

It's an interesting approach. I usually see ordinary views on the subject but yours it's written in a pretty special manner. Sure enough, I will revisit your web site for more information.

Term papers

Unknown said...

We offer the Farouk Chi Flat Iron. We provide the best price and free shipping for all the
chi flat iron. As we know, the
ghd iv styler is the first class and famous brand. So it is the good chance for you. Don't let it pass. If you are looking for the
babyliss flat iron, you have come to the right place for
instyler rotating hot iron.

GHD straighteners was known as
ghd flat iron, which was authorized online
GHD seller provides all kinds of hair straighteners,pink ghd,purple ghd,babyliss. By visiting
ghd iv salon styler , you will find what you want and made yourself more beautiful.If you miss it ,you miss beauty.Buy a piece of ghd for yourself.Come and join us
http://www.ghdhairs.com/ to win the
ghd iv mini styler.
ghd uk
ghd australia
ghd africa
ghd southafrica
t3 hair dryer
purple ghd straighteners
ghd spain
ghd ireland
ghd denmark
ghd america
ghd italy
ghd germany
ghd france
ghds
cheap ghd
purple ghd straighteners

Unknown said...

Young and creative style.
abercrombie and fitch
abercrombie & fitch
You can have a look at it.
Abercrombie and fitch outlet
ED Hardy clothing bring you a super surprise!
ed hardy wholesale clothing
If you really want it.
nike outlet

logo design said...

Nice article
logo design
logo designs
logos

trustme said...

This is really a nice blog, I appreciate you for telling us so nice things, thank you!By the way, if you like nike tn you can come here to choose! We have a lot of
nike tn,tn chaussures,
nike tn requin chaussures,nike air max tn chaussures.
If you want to find the shoes according to the sorts, then here you can have the informations,
we classied the shoes in nike presto,
nike air max,
vibram fivefingers,
converse.
At the same time, the vibram also offer you in our store.
You also can choose the most fashionable sunglasses here, it really can make you different from other people. We have
sunglasses,designer sunglasses,
wholesale sunglasses,sunglasses discount in USA.
They includ men's sunglasses,women's sunglasses.
So many fashion brands are for you,like Dior Sunglasses,
Emporio Armani Sunglasses,
Fendi Sunglasses,
Giorgio Armani Sunglasses,
Gucci Sunglasses,
LV Sunglasses and so on.

aweseome blogger said...

Then there are sequences (of one or more tokens) where there was an entity and the system guessed it right (in/O/O Palo/LOC/LOCAlto/LOC/LOC ./O/O), where there was an entity but the system missed it (in/O/O Palo/LOC/O Alto/LOC/O ./O/O), and where there wasn't an entity but the system hypothesized one (an/O/O Awful/O/ORG Headache/O/ORG ./O/O).peninggi badan

KATIE DOYLE said...

The blogger is Huge network for blogging i get lots of interesting information from here, hope blogger will modify and increase attributes to make it simpler.

Dissertation Writing Help | Dissertation Structure

Jony said...

We can recommend for women and men about Breast Enlargement Pills and How to make penis bigger.

Unknown said...

Custom Logo Design

Unknown said...

Great article..! i like the way of your writing and description.

jane simi's blog said...

Thanks very much for your suggestion.I can get a lot of information from you article.And there is also so much nice jackets for all of you,i hope you like them.
moncler
moncler jacken
moncler jackets
moncler men
moncler coats
moncler women
Thanks for you attention.

Aria Kerry said...

This might be scored as two boundary errors from the perspective of the predictions, and one (label-)boundary error from the perspective of the correct entity.
website design | web design | free website design | flyer design

Anonymous said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Unknown said...

This kind of information is very limited on internet. Nice to find the post related to my searching criteria. Your updated and informative post will be appreciated by blog loving people.
dissertation help|thesis help|assignment help|coursework writing|research writing|essay help

Anonymous said...

When I see your article, I really agree with you about the blog.I think people will know this after read the information. I hope you will share more with us. At the same time, you also can go to our website to find someting that maybe you like. We have
nike chaussures,nike shox chaussures,
nike tn,nike tn requin
nike air max chaussures,nike chaussures femmes.
nike chaussures homme,nike chaussures enfants
We have so many kinds of nike shoes that we are sure you will find the one that you like. Besides, we have the special
MBT chaussures.
If you like climbing then you can choose the vibram chaussures in our store.
You can find the Y-3 Yohji Yamamotoand
Nike Air Jordan are designed for you!

victor peterson said...

Thanks for sharing this nice blog articles.
assignment writing
assignment help
real soft technologies

Unknown said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex

Unknown said...

Wonderful blog, i recently come to your blog through Google excellent knowledge keep on posting you guys,thanks for sharing this post.

louis said...

Your blog is so nice.I am impressed with your vivid expression.I will
vuitton
handbags
neverfull | louis vuitton mahina
bookmarked you…keep up the good work!!!!

essay writing said...

i like your blog so much its my honor to give my comments on it great work man....

website designing company said...

I am very much pleased with the contents you have mentioned. I enjoyed every little bit part of it. It contains truly information. I want to thank you for this informative read; I really appreciate sharing this great.