Discourse is an area I've been peripherally interested in for quite a while; probably unduly influence by my advisor's (ex-?)interest in the problem. The general sense I get is that discourse relations are relatively easy to identify when cue-phrases exist. This is essentially the approach taken by the Penn Discourse Treebank and Daniel's original parser. Unfortunately, cue phrases exist only for a small subset of sentences, so it's tempting to look at lexical techniques, since they work so well in other problems. The intuition is that if one sentence says "love" and the next says "hate," then maybe they are in some sort of contrastive relation. Unfortunately, the world is rarely this nice. The amount of inference necessary to reliably identify seemingly simple, coarse-grained relationships, seems astonishing.
Here are some examples of contrast relations:
- The interest-only securities will be sold separately by BT Securities.
The principal-only securities will be repackaged by BT Securities into a Freddie Mac Remic, Series 103, that will have six classes.
- James Earl Jones is cast to play the Rev. Mr. Johns.
The network deals a lot with unknowns, including Scott Wentworth, who portrayed Mr. Anderson, and Bill Alton as Father Jenco.
- At the same time, second-tier firms will continue to lose ground.
In personal computers, Apple, Compaq and IBM are expected to tighten their hold on their business.
Next, let's consider background. Here are some short examples of background relations:
- The March 24 oil spill soiled hundreds of miles of shoreline along Alaska's southern coast and wreaked havoc with wildlife and the fishing industry.
Exxon Corp. is resigning from the National Wildlife Federation 's corporate advisory panel, saying the conservation group has been unfairly critical of the Exxon Valdez oil spill along the Alaskan coast.
- Insurers typically retain a small percentage of the risks they underwrite and pass on the rest of the losses.
After Hugo hit, many insurers exhausted their reinsurance coverage and had to tap reinsurers to replace that coverage in case there were any other major disasters before the end of the year.
As a final example, let's look at some evidence relations:
- A month ago, the firm started serving dinner at about 7:30 each night; about 50 to 60 of the 350 people in the investment banking operation have consistently been around that late.
Still, without many actual deals to show off, Kidder is left to stress that it finally has "a team" in place, and that everyone works harder.
- Yesterday, Mobil said domestic exploration and production operations had a $16 million loss in the third quarter, while comparable foreign operations earned $234 million.
Individuals familiar with Mobil's strategy say that Mobil is reducing its U.S. work force because of declining U.S. output.
I feel like, in many ways, this discourse relation problem subsumes the now-popular "textual entailment" problem. In fact, some of the RST relations look a lot like textual entailment: elaboration-general-specific, eleboration-set-member, Contrast (as a negative entailment), consequence, etc.
I don't have a strong conclusion here other than: this is a hard problem. There've been some attempts at learning lexical things for this task, but I feel like it needs more than that.