23 May 2006

Penn Discourse TreeBank

I visited Penn last November and while talking with Joshi, I found out about a recent annotation effort from Penn/LDC: the Penn Discourse Treebank. (There is also an upcoming tutorial at ACL about the effort). The PDTB is quite different from discourse resources I'm familiar with (eg., the RST treebank). The interesting aspect of the PDTB is that discourse connectives (either those that appear explicitly or those that are implicit) are treated as a sort of predicate between two arguments, much like one would expect for verbs.

As someone who is interested in problems that break the sentence boundary, I find much discourse-related resources and work interesting. One nice thing about the PDTB is that, since not every sentence is required to relate to another (relation only occurs when it is clear), human agreement is high and many unnatural decisions made in the RST treebank are avoided (eg., forcing things to be a tree when they shouldn't be). One not-so-nice thing about the PDTB is that not every sentence is required to relate to another. In a theory of discourse, I am looking for something that tells me why this blog post is a text and why a string of largely unrelated sentences are not. The PDTB-style annotation does not give us this. Because of this, it is hard for me to imagine many applications for which this is incredibly useful.

In case anyone more familiar with the PDTB group reads this blog, I'm curious if anyone has proposed baseline models and evaluation criteria for this problem. I have some ideas if there isn't anything out there, but I don't want to step on toes...

2 comments:

hal said...

Thanks, Nikhil. I was, in fact, wondering why there were so few implicit relations annotated...you answered my unasked question! Do you know if there are any thoughts of moving up to a higher-level discourse structure (tree or TAG based or otherwise)? I really like the idea of grounding discourse in elements that look a lot like predicates: it makes explicit the role that discourse is playing in semantic interpretation. Though I fear that as you move more high-level, the semantics of these (likely implicit) predicates will become more complex, harder to describe, and will look very much like RST satellite relations (contrast, evaluation, etc.).

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花