12 December 2007

NIPS retrospective

I got back from NIPS on Sunday; sadly I got sick on the last day. Despite this, it was by far my most productive conference ever. Admittedly, I made a point to try to make it productive (being somewhat geographically isolated means that it's in my best interest), but even so, I had a fantastic time. There are several ideas that I plan on blogging on in the near future, but for now, I'll focus on the actual conference program. The standard disclaimer applies: I didn't see everything and just because something's not listed here doesn't mean I didn't enjoy it. I'll try to limit my commentary to papers that are relevant to the NLP audience, but I may stray. I'll conclude with a bit of my sense of what was in and out this year.

  1. Kristina Toutanova and Mark Johnson had a topic-model style paper for doing semisupervised POS tagging. There are two new things here. First, they actually model context (yay!) which we know is important in language. Second, they introduce a notion of a confusion class (what possible POS tags can a word get) and actually model this. This latter is something that makes sense in retrospect, but is not obvious to think of (IMO). They actually get good results (which is non-trivial for topic models in this context).
  2. Alex Bouchard-Cote and three more of the Berkeley gang had a paper on language change. If you saw their paper at ACL, they're attacking a pretty similar problem, by looking at phonological divergences between a handful of languages. I'd love to be able to reconcile their approach with what I've been working on---I use a huge database of typological knowledge; they use word forms...I'd love to use both, but I really like working with thousands of languages and it's hard to find corpora for all of these :).
  3. John Blitzer had a good paper (with other Penn folks) on learning bounds for domain adaptation. If you care at all about domain adaptation, you should read this paper.
  4. Alex Kulesza and Fernando Pereira had a great paper on what happens if you try to do structured learning when the underlying prediction (inference) algorithm is approximate. They show that things can go very wrong in widely different ways. This is a bit of a negative results paper, but it gives us (a) a strong sense that proceeding with a generic learning algorithm on top of an approximate inference algorithm is not okay and (b) a few ideas of what can actually go wrong.
  5. Yee Whye Teh, Kenichi Kurihara and Max Welling continue their quest toward collapsed variational models for all problems in the world by solving the HDP. (For those who don't know, you can think of this as LDA with a potentially infinite number of topics, but of course the real case is more interesting.)
  6. Ali Rahimi and Benjamin Recht presented a very cool analysis of kernel methods when your kernel is of the form K(x,z) = f(x-z). This is the case for, eg., the RBF kernel. This is something that I've wanted to try for a while, but to be honest my poor recollection of my analysis training left me a bit ill equipped. Essentially they apply a Fourier analysis to such kernels and then represent the kernel product K(x,z) as a dot product in a new input space, where kernelization is no longer required. The upshot is that you can now effectively do kernel learning in a linear model in primal space, which is very nice when the number of examples is large.
  7. David Heckerman gave a really nice invited talk on graphical models for HIV vaccine design. What he was particularly interested in was analyzing whether standard methods of determining correlation between events (eg., "do I have HIV given that I have some genomic signal?") work well when the data points are not independent, but rather belong to, eg., some genetic population. This is effectively what Lyle Campbell and I were doing in our ACL paper last year on typological features. Curing HIV might be a bit more of an interesting application, though :). Essentially what David and his colleagues do is to build a hierarchy on top of the data points and then let this explain some of the interactions and then figure out what isn't explained by this hierarchy. In the linguistic sense, this is effectively separating historical from areal divergences. I'll have to read up more on what they do, precisely.
There were several themes that stuck out at NIPS this year, though they aren't represented in the above list. Probably the biggest one is recommender systems. Just search the titles for "matrix" and you'll come up with a handful of such systems. I have to believe that this is spurred, at least in part, by the NetFlix challenge. Another theme was deep belief networks, which even had a rogue workshop dedicated to them. I have a feeling we'll be seeing more and more of these things. A final theme was randomized algorithms, particularly as applied to large scale learning. I think this is a great direction, especially at NIPS, where, historically, things have been less focused on the large scale.

My only serious quibble with the program this year is that I saw a handful of papers (and at least one that got an oral and one a spotlight) that really fell down in terms of evaluation. That is, these papers all proposed a new method for solving task X and then proceeded to solve it. The experiments, however, contained only comparisons to algorithms that did not have access to all the information that X had. I'll give an example of something that would fall in this category but I have not seen appear anywhere: I build a structured prediction algorithm and compare only against algorithms that make independent classification decisions. I'm not a huge quibbler over comparing against the absolute state of the art, but it really really irks me when there are no comparisons to algorithms that use the same information, especially when standard (read: they are in any reasonable machine learning textbook) algorithms exist.

18 comments:

ydub said...

Another theme this year lies at the intersection of approximate inference in graphical models and linear/convex programming. There were at least two orals here and I probably missed others as well:

P. Mudigonda, V. Kolmogorov, P. Torr
An Analysis of Convex Relaxations for MAP Estimation

D. Sontag, T. Jaakkola
New Outer Bounds on the Marginal Polytope

It is unclear to me if these would in the end lead to practical algorithms, but the developments are definitely theoretically very nice.

There were also a number of other interesting convex programming or approximate inference papers:

L. Song, A. Smola, K. Borgwardt, A. Gretton
Colored Maximum Variance Unfolding

E. Sudderth, M. Wainwright, A. Willsky
Loop Series and Bethe Variational Bounds in Attractive Graphical Models

Chris Brew said...

The confusion class idea has been in POS-tagging since Julian Kupiec and Doug Cutting built their
tagger (cf. http://citeseer.ist.psu.edu/cutting92practical.html).
There it goes by the name of "ambiguity class". I am sure there are multiple effective ways of using this idea, including some new ones.

OiOi said...

I am a bit naive and am confused and confounded by the confusion class. I thought there was no dearth of POS tagging methods. Why the additional confusion?

Yoav said...

The "confusion/ambiguity class" of the word to be tagged is (at least) implicitly modeled in any recent POS tagger.

The ambiguity classes of the context words is successfully used by the SVMTool tagger (http://www.lsi.upc.edu/~nlp/SVMTool/).

Recent works on Hebrew/Arabic morphological disambiguation make use of context ambiguity classes as well.

Chris said...

"it was by far my most productive conference ever."

Just wondering, did this change your 'Costs and Benefits' analysis of conferences?

hal said...

Thanks to all for the POS pointers... I'll have to think a bit more about the ambiguity class thing... I have a vague sense that there's a difference, but I have to think more... I noticed Mark J didn't comment yet; he'd probably know more :P.

Chris: well, I consider Canada to be domestic ;), at least in the sense that it's cheap to attend. So I don't think this is a counter-example ;).

Yoav said...
This comment has been removed by the author.
Yoav said...

re ambiguity classes: the innovation presented in NIPS paper is in the reduced lexicon models, in which the unknown ambiguity class is inferred based on ortographic features.

Chris Brew said...

Re Yoav's comment. The Xerox tagger
does have a class guesser that assigns
an ambiguity class to unknown words on
the basis of
orthographic features. This is a major
contributor to its performance.

Anonymous said...

Ultima Online Gold, UO Gold, crestingwait
buy uo gold
buy uo gold
buy uo gold
buy uo gold
buy uo gold
buy uo gold
buy uo gold
buy uo gold
buy uo gold
buy uo gold
lotro gold
wow gold
warhammer gold
buy aoc gold
buy aoc gold
buy aoc gold
buy aoc gold
buy aoc gold
buy aoc gold
buy aoc gold
Age of Conan Gold, AOC Gold

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

Adi said...

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

cilemsin42 said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chat
sesli sohbet siteleri

sesli chat siteleri sesli sohbetsesli chat
sesli sohbet siteleri
sesli chat siteleri
SesliChat
cılgın sohbet
güzel kızlar
bekar kızlar
dul bayanlar
seviyeli insanlar
yarışma
canlı müzik
izdivac
en güzel evlilik
hersey burada
sesliparti
seslisohbet odalari
Sesli adresi
Sesli Chat
SesliChat Siteleri
Sesli Chat sitesi
SesliChat sitesi
SesliSohbet
Sesli Sohbet
Sesli Sohbet Sitesi
SesliSohbet Sitesi
SesliSohbet Siteleri
Muhabbet Sitesi
kamerali chat
Görüntülü Sohbet
Hasret gülleri
Çet sitesi
SesliSohbet
Sesli Sohbet
Canli sohbet
Turkce sohbet
Kurtce Sohbet
Kurtce Chat
Kurtce Muhabbet
Kurtce Sohbet
Kurdish Chat
SesliChat
Sesli Chat
SesliSanal
Guncel Haber
sohbet Sitesi
Chat sitesi..

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

sesli sohbet
seslisohbet
sesli chat
seslichat
sesli sohbet sitesi
sesli chat sitesi
sesli sohpet
kamerali sohbet
kamerali chat
webcam sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex