14 February 2006

Tutorial: Bayesian Techniques for NLP

I'm giving a tutorial on Bayesian methods for NLP at HLT-NAACL 2006. I gave a similar tutorial about a year ago here at ISI. This gave me a pretty good idea of what I want to keep in and what I want to cut out. The topics I intend to cover are, roughly:

  1. Bayesian paradigm: priors, posteriors, normalization, etc.
  2. Graphical models, expectation maximization, non-bayesian inference techniques
  3. Common statistical distributions: uniform, binomial/multinomial, beta/dirichlet
  4. Simple inference: integration, summaring, monte carlo
  5. Advanced inference: MCMC, Laplace, Variational
  6. Survey of popular models: LDA, Topics and Syntax, Words and Pictures
  7. Pointers to literature
All of this is, of course, cast in the context of NLP problems: all discrete distributions, language applications, etc., that hopefully both NLP and IR people will find interesting (maybe even some speech people, too).

Does anyone have anything they'd really like to hear that's not on the list? Or anything that's on the list that they don't care about? Keep in mind several constraints: 3 hours (minus coffee time), generally accessible, focused on NLP applications, and something I know something about. (For instance, I covered expectation propagation in the tutorial last year, but decided to cut it for this to give more time to other issues.) Note that I am also preparing a written tutorial that covers roughly the same material.


Kevin Duh said...

How about a discussion of NLP areas where Bayesian methods may work? This might encourage future work in Bayesian-NLP.

hal said...

indeed a difficult question!

i have mixed feelings. i tend to think that Bayesian techniques really shine in unsupervised NLP settings --- it's just so easy to get good discriminative methods to work well for the supervised problems.

one argument made at the workshop is that Bayesian techniques will work well whenever there is insufficient data. given the nlp mantra "there's no data like more data" it would seem that this is every problem. i don't believe this. at least not once you factor in the computation issue (a perceptron is just soooo fast).

Anonymous said...

As a newbee in NLP, I would be interested in such a question: What caracteristics of a NLP problem are making bayesian approach a good idea ?
But what would be much more interesting is : what carateristics make bayesian approach a *bad* idea ?
Is it just computation time ?

hal said...

i think computation time is an issue, but perhaps not the biggest one (variational EM or EP or even well implemented collapsed Gibbs/MH are often not that much slower than vanilla EM). i think (and this is essentially the message of the tutorial) that any time you're using EM, you should consider a Bayesian model instead. especially when it is difficult to exactly specify a model structure and would like to be allow more variability (i.e. a prior probability instead of a 0/1 decision), or when the space of models is enormous in comparison to the size of the data set, it's probably worth a try.

Delip Rao said...

Could you pls share the slides of your HLT/NACCL tutorial?

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Anonymous said...

Little knowledge is dangerous! Statistical NLP is like statistical conitive neuroscience: both will achieve nothing, because the problem at hand is way beyond a couple of formulae