20 December 2005

Summarization

Summarization is one of the canonical NLP problems, but of all the big ones (MT, speech, IR, IE, etc.) it is in my opinion the most difficult (let the flames begin!). The reason I think it's so hard is because it's unclear what a summary is. When one cannot define a problem well, it is impossible to solve, and difficult ⊆ impossible. There has been an enormous amount of effort to define specific summarization problems that we can solve, a comparable amount of effort on figuring out how to measure how good a solution is, and a lot of effort on building models/systems that can solve these problems. We've tried many things; see the history of DUC for a small, biased, incomplete set.

That said, I think the field has much potential, but not necessarily by trying to mimick what a human would do when asked to produce a summary. I'm interested in doing summarization-like tasks that a human would never be able to do reliably. Here are some examples to give a flavor of what I'm talking about:


  • Let me go to a scientific engine like Rexa or CiteSeer and ask for a summary of "reinforcement learning." The system should know what papers I've read, what papers I've written, the relationship between all the papers in its database, an analysis of the goodness of authors and so on. What it produces for me must be markedly different from what it would produce for Satinder Singh.

  • Let me go to Amazon and ask about new digital cameras. Maybe I'm interested in spending $200-$300. It should know that I've never bought a digital camera before, or that I've bought 4. I want a summary of important specifications, user comments and so on.

  • Let me go to my own inbox and ask for a summary of what I've been discussing with John recently.



One can imagine many more similar tasks, but these are three obvious ones. The nice thing about these is that even partial solutions would be enormously useful (to me, at least...my mom might not care about the first). These are also things that people really can't do well. If someone asks me for something like the first one but, say, on structured prediction instead of reinforcement learning, I can give a summary, but it will be heavily biased. It is worse for the second, where I can basically only produce anecdotal evidence, and virtually impossible for the third.

The most important outstanding issue is how to measure sucess at such problems. I cannot imagine how to do this without doing user studies, but people probably felt the same way about MT a few years ago. How about now? But given the amount of personalization in these tasks, I feel that it would be harder to do automatic evaluation of them. Probably the most important things to measure in user studies are subjective satisfaction, how many times multiple searches had to be performed and so on. One could also take a TREC style approach for comparative pair-wise evaluations by marking system 2 down if it missed something system 1 found that a human thought was important.

There are also tons of subproblems that can be pulled from this tangle of tasks. Most notably, personalization methods, social network analysis methods, redundancy identification, coherence, information presentation (UI) techniques, generation of multimodal outputs (tables, graphs, etc.), dealing with imperfect input (googling for "reinforcement learning" also produces irrelevant documents), opinion identification, processing of ungrammatical input, anti-spam, and so on. I'm not a huge proponent of just solving subtasks that aren't proven to be necessary, but it is sometimes helpful to take this approach. I think we just have to keep the big picture in our minds.

28 comments:

  1. Isn't summarization in essence information extraction with fancy generation at the end? And what good is that generation?

    Here is an idea: Why not pose some grand challenges for information extraction, such as biography generation, argument extraction from political discussions, trendspotting, etc.

    ReplyDelete
  2. IE typically focuses on filling pre-defined templates. Summarization attempts to be broader: in addition to pulling out information in a wider range than, say, natural disasters, it also attempts to appraise novelty, interestingness and importance. Ed has suggested that one way to do summarization is to create 1000 IE templates and build a good IE system. This would be one way to go, but I don't think it would work for the application list I care about (maybe it would be okay for news). But yes, I think things like argument extraction and trendspotting are really cool things to work on in the same general area.

    ReplyDelete
  3. 酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

    ReplyDelete
  4. I found your site on google, great site, keep it up. Will return in the future. Submitted this post to Google News Reader. Please come visit my site fleet management solutions when you got time.

    ReplyDelete
  5. sorry to ask this here but… I really love your theme, would it happen to be a free one i can download somewhere, or is it a custom theme you had made? Soon i will be launching my own blog, however i’m not great with designs but i do like the style of your site so it would be excellent if i could find (or buy) something with a similar look as my last designer cannot finish my site. Thanks! Please come visit my site supply chain management industry when you got time.

    ReplyDelete
  6. I am deeply in love with every single piece of information you post here. Will be back often to read more updates! Please come visit my site nail salons when you got time.

    ReplyDelete
  7. Glad I stumbled into this article! Finally, got what I was looking for to put on my school report... Thanks for sharing this. I would love some feedback on my site shoe when you got time.

    ReplyDelete
  8. Summary is the need of today's generation. Everyone wants a short and quick description.
    dentists edinburgh

    ReplyDelete
  9. Really great work. Congrats to everyone who are involved with this project. The website layout and graphics are really cool. Please come visit my site Wholesale Canada Designer Watches when you got time.

    ReplyDelete
  10. I am deeply in love with every single piece of information you post here. Will be back often to read more updates! Please come visit my site healthy skin when you got time.

    ReplyDelete
  11. Glad I stumbled into this article! Finally, got what I was looking for to put on my school report... Thanks for sharing this. I would love some feedback on my site cause of belly fat when you got time.

    ReplyDelete
  12. Organic weed control and lawn weed control guides and tips. Include gardening basic information and soil and composting resources.lawn weed control

    ReplyDelete
  13. Interesting post about summarization. I would like to know more about this field

    ReplyDelete
  14. A fantastic read….very literate and informative. Many thanks….what theme is this you are using and also, where is your RSS button ? Refrigeration
    And Air Conditioning
    , Driver
    Training
    , preservation, Canoe
    Kayak Dealers
    , Hvac Systems
    Hvac Contractor
    , Driving Instruction
    School
    , historic property

    ReplyDelete
  15. There are certainly a lot of details like that to take into consideration. That’s a great point to bring up. I offer the thoughts above as general inspiration but clearly there are questions like the one you bring up where the most important thing will be working in honest good faith.
    Canoe
    Kayak Canada
    , Air Conditioning
    Units
    , Bus Driver Training, historical preservation, Canoe
    Or Kayak
    , Split Air Conditioning

    ReplyDelete
  16. You got a really useful blog I have been here reading for about an hour. I am a newbee and your success is very much an inspiration for me. The first page doesn’t need to be just a dream anymore.

    Quebec
    QC Find Business
    , Saskatchewan SK Find Business, Necklace
    wholesaler
    , fashion
    jewelrywholesale
    , security alarm system, Quebec
    QC Phone Listings
    , Saskatchewan SK Phone Listings, wholesale
    necklace

    ReplyDelete
  17. Valuable information and excellent design you got here! I would like to thank you for sharing your thoughts and time into the stuff you post!! Thumbs up


    designer
    costume jewelry
    , security camera systems, Quebec
    Business Listings
    , Saskatchewan Business Listings, ,wholesale necklace jewelry,, wholesale
    rhinestone costume jewelry
    , security alarm systems, Quebec
    QC Directory Listing

    ReplyDelete
  18. Nice post. I have been searching for articles about valves and actuators and your post really helps. Thanks a lot for posting this.
    loans

    ReplyDelete