natural language processing blog: Humor is Hard

23 September 2006

Humor is Hard

Several months ago I became temporarily interested in trying to automatically identify if entries in online discussions are informative, interesting, humorous, etc. (This was somewhat in the context of a summarization sort of system, but the problem seems more generic.) It turns out that in the comments section of slashdot, people manually tag comments into such categories. I spent a few weeks crawling slashdot (eventually getting my IP banned because this is apparently not allowed) and grabbed a few thousand stories and associated comments. I spent a few hours building a straightforward classifier based on the comment labels. It turns out one of the hardest sorts of comments to classify correctly are the funny ones.

In general, I think identifying humor (or attempted humor) is a very hard problem. It seems to almost require a substantial amount of world knowledge and inference capabilities, since humorous comments are rarely signalled by straightforward lexical cues (though having three exclamation points or a smiley is a good indicator, these actually occur surprisingly rarely).

To get a sense of why this is so hard, let's look at some examples. These are grabbed from slashdot two days ago (the 21st).

In one article titled Motorola Unveils Phone Vending Machines (which talks about how you can buy cell phones from vending machines and they they are delivered by robotic arm rather than dropping ala sodas), we have the following comments marked humorous: "can i use the cell phones I want to purchases to purchases the cell phone I am purchasing?" and "I have a hard enough time trying to pull a big old stuffed animal out with those robotic arms much less a tiny tiny phone. At 50 bucks a pop rather than 50 cents, I'm going to waste a lot of money."

In another article about Googling for ATM Master Passwords, we have the following comments. "[Subj: The default password is...] I thought it was up, up, down, down, left, right, left, right, B, A, Start ..." (for those not of my generation, this is the cheat code for the NES game Contra and several other Konami games). Additionally, in response to "Whoever makes these ATMs deserves all the bad publicity that they get." someone comments "Might it be Diebold, by any chance?"

Finally, in commenting about the article Fish Work as Anti-terror Agents (which discusses how fish like the bluegill help detect poisonous substances in water supplies), we get comments like "In Australia, we have stingrays guarding us from pests." and "How do we know this isn't a red herring by some terroist group?" and finally "Does this mean we can carry water bottles on planes again -- if they have bluefish swimming in them?"

You may take issue with the degree to which these comments are funny, but regardless of whether they actually are funny, the certainly were intended to be funny.

What I find fascinating about all these examples is that they're essentially playing the game of drawing surprising comparisons between the article at hand and other common knowledge. For instance, the "robotic arms" comment is based on our shared experience of failing at fairs to get stuffed animals. The stingray comment is in regards to Steve Irwin's recent death, and the waterbottle joke is in reference to the new airline policies. While some (eg., the waterbottle joke) are perhaps easy to identify because they seem "off topic" somehow, other ones (like the Diebold comment or the stingray comment) really are on topic for the article, but just play against some alternative story that we're all expected to know.

I'm not sure what my conclusion is, but if you're out there looking for a really hard text classification problem for which it at least seems that a lot of knowledge and inference is required, you may find humor detection fun.

6 comments:

Anonymous said...: I've always wondered what relation can there be between CL and philosophy of language. There's an interesting paper by philosopher Donald Davidson about metaphors. Basically he says that they are not a linguistic entity (although they contain one). The same would apply to irony and humour, I guess. The article is named "What Metaphors Mean" and it appeared in Critical Inquiry 1978.
(this relates also to hal's previous post about the different NLP/CL/cogNLP tasks); 25 September, 2006 04:11
Kevin Duh said...: Suppose we do develop a good humor classification system. What would one do with it? What are some good applications? I don't have anything in mind...; 25 September, 2006 17:59
hal said...: For "serious" work, I would imagine it would be useful to just strip all humorous comments. Eg., if you wanted to produce "serious" summaries of topics that are discussed in blogs, you would probably not want to include parts of jocular replies. Given that the raw classification problem seems so hard, it's easy to imagine that these would just creep in to extractive summaries unless you explicitly guard against it.

I suppose this would cause problems due to things like the Daily Show and Colbert Report, since they aim (sort of) to be half-serious, there are certainly cases where important information is conveyed through humor. But I don't think this actually happens much. If nothing else, humorous statements, even if true and factual and "serious", are rarently non-opinionated (I would guess).; 26 September, 2006 08:51
Anonymous said...: hi dear
i'm mahdi from iran, i'm computer student and wanna to research about natural language processing for my final project, i love this topic and if you help me in this field i will thank you, we can be a good friend and we can work with each other to about NLP, this is my email address mahdishaytoon@yahoo.com
thank you
mahdi; 26 September, 2006 12:55
Ettsem said...: One definition of humour I find helpful goes like this: "Humour is the surprise revelation of an unexpected connection." (I don't remember who said that.)

A further elaboration (from Ramachandran, I believe) suggests that humour makes us "amused" because we perceive an utter lack of threat in the new information; the laughter is expiation. (The author Larry Niven once described humour as "an interrupted defense mechanism".)

I wrote an "intelligent writing" detector once, and that was fairly easy. But if I think it would be very hard indeed to create a humour detector. How would a computer program figure out that it was supposed to be surprised but not threatened?; 09 October, 2006 10:15
Anonymous said...: 酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店酒店兼差PRETTY GIRL酒店公關酒店小姐彩色爆米花酒店兼職,酒店工作彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀彩色爆米花; 12 May, 2009 11:13

natural language processing blog

23 September 2006

Humor is Hard

6 comments:

About Me

Labels

My Blog List

Blog Archive