07 May 2007

Non-linear models in NLP

If you talk to (many) "real" machine learning people, they express profound disbelief that almost everything we do in NLP is based on linear classifiers (maxent/lr, svms, perceptron, etc.). We only rarely use kernels and, while decision tress used to be popular, they seem to have fallen out of favor. Very few people use them for large-scale apps. (Our NYU friends are an exception.)

There are two possible explanations for this. (1) we really only need linear models; (2) we're too lazy to use anything other than linear models (or, alternative, non-linear models don't scale). My experience tells me that for most of our sequence-y problems (parsing, tagging, etc.), there's very little to be gained by moving to, eg., quadratic SVMs. I even tried doing NE tagging with boosted decision trees under Searn, because I really wanted it to work nicely, but it failed. I've also pondered the idea of making small decision trees with perceptrons on the leaves, so as to account for small amounts of non-linearity. Using default DT construction technology (eg., information gain), this doesn't seem to help either. (Ryan McDonald has told me that other people have tried something similar and it hasn't worked for them either.) Perhaps this is because IG is the wrong metric to use (there exist DTs for regression with linear models on the leaves and they are typically learned so as to maximize the "linearness" of the underlying data, but this is computationally too expensive).

One counter-example is the gains that people have gotten by using latent variable models (eg., Koo and Collins), which are essentially non-linearified linear models. In somewhat a similar vein, one could consider "edge" features in CRFs (or any structured prediction technique) to be non-linear features, but this is perhaps stretching it.

Part of this may be because over time we've adapted to using features that don't need to be non-linearified. If we went back and treated each character in a word as a single feature and then required the learning algorithm to recover what important features (like, "word is 'Bush'") then clearly non-linearity would be required. This is essentially what vision people do, and exactly the cases where things like deep belief networks really shine. But so long as we're subjecting our learning algorithms to featuritis (John Langford's term), perhaps there's really not much to gain.

15 comments:

Bob Carpenter said...

I question Hal's presupposition that "almost everything we do in NLP is based on linear classifiers".

To the extent that we build taggers, chunkers, parsers, machine translators, speech recognizers, noisy channel spelling correctors, latent topic clusterers, social network analyzers, etc., we're not using linear models, at least in the usual sense. Some of these applications may have linear components, of course.

About the only threads of research I can think of that look even partially like linear modeling to the rest of the world would be the log-linear models of max ent and CRFs, simple TF/IDF models in information retrieval, the intriguing joint models of Dan Roth and crew at Illinois, and the use of singular value decomposition (SVD) for linear smoothing.

I have more understanding of the shock of the statisticians, which derives from machine learning's wham-bam focus on maximum a posteriori (MAP) models and first-best evaluations (0/1 loss) rather than Bayesian uncertainty estimates and probabilistic evaluation (log likelihood loss). I think this explains the prevalence of discriminitive, non-probabilistic techniques like perceptrons and SVMs, which also happen to be linear.

hal said...

Hrm...maybe I'm thinking of a different sense of "linear", but all of the examples you list in your second paragraph I would consider linear models (well, not topic models, and perhaps not some social network models). If you think of almost any parser, you get a score for a tree that's a product of a bunch of smaller scores, which are linear (exponential) functions, yielding an overall linear function.

I think the only examples when this doesn't hold are really the unsupervised systems... but even then, we typically assume that each latent component yields a linear function (certainly true in MT and speech). So while the overall functional isn't linear, it's a mixture of linear functions.

I guess the primary comparison I'm drawing is to vision and robotics problems, where they *never* use linear kernels for anything because they suck terribly, unless you do massive feature engineering (eg., wavelets, etc.).

Kevin said...

I'm personally a big fan of linear models, due to its simplicity of implementatation, speed, and ease of use, etc. I think the reason NLPers often have linear models is that our features are often discrete (or even binary), and the simple method of generating a bunch of feature combinations (thereby explictly using polynomial-type kernels) works pretty well. I don't know whether this is the best possible solution for language data, but it certainly is reasonable.

Anonymous said...

hello.. My name is wanee. I'm from Malaysia and student of university. My group is needed to build the NLP system. So, I want to know about the suitable of language such as java,prolog or any, which suitable language can use to build this system.Please send your suggestion to my email,wanee_snar1803@yahoo.com. Thank you very much.

Bob Carpenter said...

I think I see what Hal's saying now -- as long as we make naive independence assumptions that lead to our adding log probabilities, our models look linear in some sense. For instance, PCFGs model the log probability of a tree as the sum of the log probabilities of its local trees. HMMs model the log probability of an underlying state sequence and output as a sum of the log probabilities of the transitions and emissions.

But as soon as we start asking interesting probabilistic questions, the models no longer look linear. For instance, the probability of a token having a tag in an HMM part-of-speech model requires forward/backward; the probability of a category spanning a text chunk in a PCFG requires inside/outside. Both of these require sums of linear probabilities, not sums of log probabilities. So the overall probability estimates are not linear.

As soon as we do the kernel trick (explicitly or implicitly), we're also non-linear in underlying features. For instance, we might model feature F1, feature F2, and then introduce feature F1and2 which is only on if both F1 and F2 are on. A linear model fitting in an enhanced feature space (with conjunctions or generalizations to products if features are more than indicators) would then not be linear in Hal's sense (I'm guessing).

One more thing: I've been reading Gelman and Hill's new multi-level regression book (which I'd highly recommend), and was thinking in terms of numerical predictions. In this sense, NLP tends not to model continuous variables, as say, found in the Netflix task.

hal said...

i think we're on the same page now :). except i don't follow the "interesting questions bit." regardless of what question we ask, the model is still linear. sure, the answer may not be. looking for a tag marginal won't be. but by the same token, looking for the probability of a sequence squared won't be (because it's squared), but that's not a property of the model.

yes, the kernel trick throws us into nonlinear land, but how many people actually use kernels? and i've myself sometimes used conjunctions, etc., but these i found by hand rather than automatically (so i still consider the underlying model linear). if it found the conjunctions, i would consider it non-linear.

i think i'll post on the 0/1 versus prop issue soon :).

aiyipianni said...

Aston Villa rode their luck at Hull City where an 88-minute own goal from Kamil Zayatte saw them leapfrog three points clear of Arsenal and into fourth place in the Premier League wow gold with a 1-0 win.

Villa had to survive Hull penalty wotlk gold appeals for a handball against Ashley Young in time added on, television replays showing that referee Steve Bennett wow gold correctly rejected the claims after consulting a linesman.

Bennett had been involved in controversy after just five minutes when American goalkeeper Brad Friedel looked to have handed Hull the initiative and threaten Villa's return to the Champions League qualifying wow gold zone.

Friedel spilled the ball under pressure from wow gold Nick Barmby and stand-in right-back Nigel Reo-Coker turned it into his own net as he attempted to wow gold clear.

But Bennett cut short celebrations at the KC Stadium -- and let Friedel off the hook -- when he ruled out the score for an wow wotlkapparent infringement by Barmby.

Zayatte's intervention from a Young cross bound for wow gold Gabriel Agbonlahor then saw Villa leapfrog Arsenal and draw level with Manchester United on 38 points -- seven adrift of leaders Liverpool and four wow gold behind Chelsea.

Stung by an on-pitch dressing down wow gold by manager Phil Brown at Manchester City last week, Hull showed five changes and a vastly improved performance.

Promoted Hull were looking for only their second win in 11 games while wow gold Villa arrived unbeaten in seven and it looked to be heading for a goalless draw when the home side suffered a cruel late blow.

aiyipianni said...

South Africa inflicted the first home series defeat on Australia in almost 16 years as they wrapped up a nine-wicket win over the world's number one ranked world of warcraft gold Test nation in Melbourne on Tuesday.

Captain Graeme Smith wow power leveling hit a fluent 75 as his side successfully passed a world of warcraft gold modest victory target of 183 on the final day at the MCG to take an wow powerleveling unassailable 2-0 lead.

It was the South African's first-ever Test series triumph in Australia and dofus kamas victory in the third and final match in Sydney will see them leapfrog the home side at the top of the global rankings.

Hashim Amla (30 not out) scored the winning runs shortly after lunch as South Africa became the first team to overcome Australia at home since the West Indies in 1992-93.

South Africa were never under any pressure in their run chase and did not lose a wicket until just before lunch when the inspirational Smith Lord of the Rings Online Gold was trapped leg before wicket by Nathan LOTRO Gold Hauritz.

Smith had flyff penya dominated a 121-run opening stand flyff money with Neil McKenzie, hitting ffxi gil 10 boundaries.

McKenzie struggled to buy ffxi gil a half century and survived strong eq2 plat lbw shouts from Brett Lee, eq2 gold who was bowling despite an injured foot that will Lord of the Rings Online gold keep him out of the Sydney Test.

South Africa's LOTRO gold victory was set up by a brilliant maiden Test century fly for fun penya from JP Duminy, who shared a stunning flyff penya 180-run ninth wicket partnership with pace bowler Dale Final Fantasy XI gil Steyn.

It gave the tourists ffxi gil a priceless 65-run lead on first innings before man of eq2 plat the match Steyn worked his magic with the ball as Australia were eq2 gold bowled out on the fourth day for 247 in their second innings.

The pugnacious Smith was virtually runescape money lost for words in his victory speech.

"It has been such a special moment runescape gold for all of us, it has been an incredible team effort," he said.

"I have been smiling non-stop wow po since we hit the winning runs.

"To be 2-0 up after this game was something wow or we only dreamt of."

South Africa won the first Test in Perth from an unlikely position, chasing 414 for victory for the loss of only four wickets.

gfutfy said...

This momentousdecree warcraft leveling came as a great beacon light wow lvl of hope to millions of negroslaves wow power level who had been seared power leveling in the flames of power leveling withering wrath of the lich king power leveling injustice.wrath of the lich king power leveling it came as a WOTLK Power Leveling joyous daybreak to end the long WOTLK Power Leveling night ofcaptivity.WOTLK Power Leveling but one hundred years wlk power leveling later, we must face aoc gold the tragic fact thatthe age of conan power leveling negro is still not free. aoc power leveling one hundred years later,age of conan power leveling the lifeof the negro ffxi gil is still sadly crippled by the final fantasy xi gil manacles ofsegregation guild wars gold and the chains of discrimination. one hundred yearslater, maplestory mesos the negro lives on a lonely island of poverty in themidst of a vast ocean of material prosperity.dog clothes one hundred yearslater

Gold Guide for World of Warcraft said...

good post :)

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要打工兼差打工,兼差,八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店日領上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有鋼琴酒吧又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢便服店的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的賺錢,酒店和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex