05 April 2007

Discourse is darn hard

Discourse is an area I've been peripherally interested in for quite a while; probably unduly influence by my advisor's (ex-?)interest in the problem. The general sense I get is that discourse relations are relatively easy to identify when cue-phrases exist. This is essentially the approach taken by the Penn Discourse Treebank and Daniel's original parser. Unfortunately, cue phrases exist only for a small subset of sentences, so it's tempting to look at lexical techniques, since they work so well in other problems. The intuition is that if one sentence says "love" and the next says "hate," then maybe they are in some sort of contrastive relation. Unfortunately, the world is rarely this nice. The amount of inference necessary to reliably identify seemingly simple, coarse-grained relationships, seems astonishing.

Here are some examples of contrast relations:

  1. The interest-only securities will be sold separately by BT Securities.
    The principal-only securities will be repackaged by BT Securities into a Freddie Mac Remic, Series 103, that will have six classes.
  2. James Earl Jones is cast to play the Rev. Mr. Johns.
    The network deals a lot with unknowns, including Scott Wentworth, who portrayed Mr. Anderson, and Bill Alton as Father Jenco.
  3. At the same time, second-tier firms will continue to lose ground.
    In personal computers, Apple, Compaq and IBM are expected to tighten their hold on their business.
In (1), we need to know that interest-only and principal-only securities are essentially themselves contrastive. We then need to combine this information with the coreference of BT Securities in both cases (should be easy). This seems like maybe we could possibly get it with enough data. For (2), we essentially need to know that James Earl Jones is famous, and that "deals with unknowns" is the same as "casts." For (3), we need to know that Apple, Compaq and IBM are not second-tier firms. These latter two seem quite difficult.

Next, let's consider background. Here are some short examples of background relations:
  1. The March 24 oil spill soiled hundreds of miles of shoreline along Alaska's southern coast and wreaked havoc with wildlife and the fishing industry.
    Exxon Corp. is resigning from the National Wildlife Federation 's corporate advisory panel, saying the conservation group has been unfairly critical of the Exxon Valdez oil spill along the Alaskan coast.
  2. Insurers typically retain a small percentage of the risks they underwrite and pass on the rest of the losses.
    After Hugo hit, many insurers exhausted their reinsurance coverage and had to tap reinsurers to replace that coverage in case there were any other major disasters before the end of the year.
The first example here might be identifiable by noting that the tense in the two sentences is different: one is present, one is past. Beyond this, however, we'd essentially have to just recognize that the oil spill referred to in both sentences is, in fact, the same spill. We'd have to infer this from the fact that otherwise the first sentence is irrelevant. In (2), we'd have to recognize that the "passed on" losses go to reinsurers (I didn't even know such things existed).

As a final example, let's look at some evidence relations:
  1. A month ago, the firm started serving dinner at about 7:30 each night; about 50 to 60 of the 350 people in the investment banking operation have consistently been around that late.
    Still, without many actual deals to show off, Kidder is left to stress that it finally has "a team" in place, and that everyone works harder.
  2. Yesterday, Mobil said domestic exploration and production operations had a $16 million loss in the third quarter, while comparable foreign operations earned $234 million.
    Individuals familiar with Mobil's strategy say that Mobil is reducing its U.S. work force because of declining U.S. output.
These examples also seem quite difficult. In (1), the fact that the "team" is supposed to refer to the "50 to 60 ... people" and that having dinner, within the context of a company, is a "team-like" activity. (2) is somewhat easier -- knowing that a "loss" is related to a "decline" helps a lot, though it is entirely possible that the type of relation would differ. You also pretty much have to know that "domestic" means "U.S."

I feel like, in many ways, this discourse relation problem subsumes the now-popular "textual entailment" problem. In fact, some of the RST relations look a lot like textual entailment: elaboration-general-specific, eleboration-set-member, Contrast (as a negative entailment), consequence, etc.

I don't have a strong conclusion here other than: this is a hard problem. There've been some attempts at learning lexical things for this task, but I feel like it needs more than that.

7 comments:

Anonymous said...

... we'd have to recognize that the "passed on" losses go to reinsurers (I didn't even know such things existed).

My father works in insurance. He has told me that this can even loop around, i.e., the insurers get reinsured, the reinsurers get reinsured, etc., until the original insurer becomes a reinsurer of a small part of whatever it was that they were insuring in the first place!! This may sound ridiculous, but it happens because policies are often reinsured in packages in which parts may link back to the companies original policy.

Not really relevant for NLPERs, but I found this amusing when I first heard it.

Kevin Duh said...

Hal--do you have some good references for learning about discourse for someone who knows nothing about it? (Thanks in advance!)

Ryan--Is this loopy reinsurance business pretty common? It sounds like it won't converge... :)

hal said...

kevin -- daniel ran a discourse structure reading group my first semester at ISI; it has a good list of papers. i'd say mann+thompson, grosz+sidner and hobbs90 are the three "must reads." for more "practical" matters, i'd suggest strube+hahn and hearst97 (all refs are on the link). for discourse parsing stuff, one of daniel's early papers (96-97ish) would probably be a good place to start.

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Anonymous said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

Anonymous said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Unknown said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex