04 August 2006

Future DUC Tasks

The Document Understanding Conference features a yearly summarization competition. For the past few years, the task has been query-focused summarization of clusters of (essentially entirely) news documents. There will be a pilot task next year and based on comments made during DUC 2006, it appears it will be one of the following:

  1. Multidocument, (probably) query-focused summarization of blog posts.
  2. Multidocument summarization of news, with respect to known information.

The idea in (1) is that there are several "novel" aspects one has to deal with.  First, blog posts are out of domain for most parsers, etc., which means we'll get noisy input but not as noisy as speech.  Second, although the blog posts (the blogs would be from the TREC blog collection) will essentially all focus on news topics (saldy, NLPers is not in the corpus), they are almost certainly more emotionally fueled than vanilla news.  The identification of sentiment and opinion, which are both in vogue these days, will potentially become more useful.

The idea in (2) is that in most real world situations, the user who desires the summary has some background information on the topic.  The idea is that the summarization engine would be handed a collection of 5-10 documents that the user has presumably read, then 5-10 new documents to be summarized.  The novel aspect of this task is, essentially, detecting novelty.

Personally, I think both are potentially interesting, though not without their drawbacks.  The biggest potential problem I see with the blogs idea is that I think we're reentering the phase of not being able to achieve any sort of human agreement without fairly strict guidelines.  It's unclear if, say, two viewpoints are expressed, how a summary should reflect these.  The biggest problem I see with idea (2) is that it is very reminiscent of some TREC-style tasks, like TDT, and I'm not sure that doing anything more than essentially doing normal query-focused summarization with an MMR-style term to account for "known information."  That's not to say these aren't worth exploring -- I think both are quite interesting -- but, as always, we should be careful.


Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

lipil said...

Pandora gorgeous jewelry is really like a story-telling handbook. It presents the preference and type of the user, and gets to be more precious the minute
tiffany jewelry people gift you with expensive jewelry as wll as beads to help to increase all your Pandora band, or necklace around your neck. That it symbolizes a certain emotional benefit easily because it causes a piece of tiffany and co whose materials originated from people around you, and whose background is yours.
However just like virtually any alternative element in this world, just about every
tiffany jewellery goes thru unavoidable wear and rip. Even though cheap pandora has the capability to last (or even meet
tiffany and coor exceed) a long time, a person should generally practice with regards to your own Pandora piece of art.
It will probably be the source of injury to the pandora jewellery. Just simply gain the benefits of a very soft cloth to wipe off the mud. In no way present a person's pandora braceletsto unpleasant, formidable chemical compounds.
The thomas sabo's is among those makes that in fact level the start of a classic design and style and style without treatment. Links of London is the representative of thomas sabo charmsand large flavor; a lot more people have the wants having on sweetie bracelets.
As identified by everybody, the best links of london of on the net purchasing is the quality of time it saves. Unavoidable how busy you have got in our lives so on the internet searching internet sites are just like links of london friendship bracelets to us. links of London friendship braceletlinks of London sweetie braceletlinks of london charms