23 May 2006

Penn Discourse TreeBank

I visited Penn last November and while talking with Joshi, I found out about a recent annotation effort from Penn/LDC: the Penn Discourse Treebank. (There is also an upcoming tutorial at ACL about the effort). The PDTB is quite different from discourse resources I'm familiar with (eg., the RST treebank). The interesting aspect of the PDTB is that discourse connectives (either those that appear explicitly or those that are implicit) are treated as a sort of predicate between two arguments, much like one would expect for verbs.

As someone who is interested in problems that break the sentence boundary, I find much discourse-related resources and work interesting. One nice thing about the PDTB is that, since not every sentence is required to relate to another (relation only occurs when it is clear), human agreement is high and many unnatural decisions made in the RST treebank are avoided (eg., forcing things to be a tree when they shouldn't be). One not-so-nice thing about the PDTB is that not every sentence is required to relate to another. In a theory of discourse, I am looking for something that tells me why this blog post is a text and why a string of largely unrelated sentences are not. The PDTB-style annotation does not give us this. Because of this, it is hard for me to imagine many applications for which this is incredibly useful.

In case anyone more familiar with the PDTB group reads this blog, I'm curious if anyone has proposed baseline models and evaluation criteria for this problem. I have some ideas if there isn't anything out there, but I don't want to step on toes...


Nikhil Dinesh said...

I do some work with the PDTB group at Penn. Let me preface this by saying I agree with most of what you have to say. Just thought I'd add a couple of things:

1. The DLTAG theory on which the PDTB is based, is a theory of low-level discourse structure. While there is no doubt that there may be higher level relations (for example, of the rhetorical kind), these are beyond the purview of the PDTB.

2. In the PDTB, every sentence is related to prior material in the same paragraph, via the use of "implicit connectives". This is done for sentences where there is no connective relating it to the prior discourse. The annotator chooses a connective that best describes the relationship of the current sentence to a prior one.

The choice of a paragraph as a self contained unit (for implicit relations) was a purely practical decision, as we weren't sure at the time what problems annotators would run into.

One of the things that comes up is that on occasion a connective doesn't express the relation between sentences. Sometimes a sentence just gives more information about some entity in the discourse (entity-based coherence), or there is some other phrase relating the two sentences, like "for this reason".

Implicit connectives are available only for sections 08, 09 and 10 in the first release. The rest will be added in V2.

hal said...

Thanks, Nikhil. I was, in fact, wondering why there were so few implicit relations annotated...you answered my unasked question! Do you know if there are any thoughts of moving up to a higher-level discourse structure (tree or TAG based or otherwise)? I really like the idea of grounding discourse in elements that look a lot like predicates: it makes explicit the role that discourse is playing in semantic interpretation. Though I fear that as you move more high-level, the semantics of these (likely implicit) predicates will become more complex, harder to describe, and will look very much like RST satellite relations (contrast, evaluation, etc.).

Nikhil Dinesh said...

As far as I know, there has been no discussion about higher level discourse relations in DLTAG. I echo your fear of the high-level! It seems to me that resolving scope and anaphora issues, might be an easier way to ensure progress in the short term.

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Adi said...

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

Unknown said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet