tag:blogger.com,1999:blog-19803222.post5078534989422359502..comments2024-03-18T01:45:45.724-06:00Comments on natural language processing blog: Non-linear models in NLPhalhttp://www.blogger.com/profile/02162908373916390369noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-19803222.post-883852360213862392009-05-12T10:52:00.000-06:002009-05-12T10:52:00.000-06:00酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒...酒店經紀PRETTY GIRL <A HREF="http://www.taipeilady.com/" REL="nofollow" TITLE="台北酒店經紀人">台北酒店經紀人</A> ,<A HREF="http://tw.myblog.yahoo.com/jw!qZ9n..6QEhhc0LkItOBm/" REL="nofollow" TITLE="禮服店">禮服店</A> 酒店兼差PRETTY GIRL<A HREF="http://www.mashow.org/" REL="nofollow" TITLE="酒店公關">酒店公關</A> 酒店小姐 彩色爆米花<A HREF="http://blog.xuite.net/jkl338801/blog/" REL="nofollow" TITLE="酒店兼職">酒店兼職</A>,酒店工作 彩色爆米花<A HREF="http://tw.myblog.yahoo.com/jw!BIBoU5SeBRs21nb_ajFpncbTqXds" REL="nofollow" TITLE="酒店經紀">酒店經紀</A>, <A HREF="http://mypaper.pchome.com.tw/news/thomsan/3/1310065116/20080905040949/" REL="nofollow" TITLE="酒店上班">酒店上班</A>,酒店工作 PRETTY GIRL<A HREF="http://tw.myblog.yahoo.com/jw!rybqykeeER6TH3AKz1HQ5grm/" REL="nofollow" TITLE="酒店喝酒">酒店喝酒</A>酒店上班 彩色爆米花<A HREF="http://mypaper.pchome.com.tw/news/jkl338801/" REL="nofollow" TITLE="台北酒店">台北酒店</A>酒店小姐 PRETTY GIRL<A HREF="http://www.mashow.org/" REL="nofollow" TITLE="酒店上班">酒店上班</A>酒店打工PRETTY GIRL<A HREF="http://www.tpangel.com/" REL="nofollow" TITLE="酒店打工">酒店打工</A>酒店經紀 彩色爆米花Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-69886413874850320242009-05-10T06:08:00.000-06:002009-05-10T06:08:00.000-06:00good post :)good post :)Gold Guide for World of Warcrafthttp://www.warcraft-world.info/world-of-warcraft-gold-guide.phpnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-74783302206204767342007-05-14T13:03:00.000-06:002007-05-14T13:03:00.000-06:00i think we're on the same page now :). except i d...i think we're on the same page now :). except i don't follow the "interesting questions bit." regardless of what question we ask, the <I>model</I> is still linear. sure, the answer may not be. looking for a tag marginal won't be. but by the same token, looking for the probability of a sequence squared won't be (because it's squared), but that's not a property of the model.<BR/><BR/>yes, the kernel trick throws us into nonlinear land, but how many people actually use kernels? and i've myself sometimes used conjunctions, etc., but these i found by hand rather than automatically (so i still consider the underlying model linear). if it found the conjunctions, i would consider it non-linear.<BR/><BR/>i think i'll post on the 0/1 versus prop issue soon :).halhttps://www.blogger.com/profile/02162908373916390369noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-5325967902818683942007-05-09T16:05:00.000-06:002007-05-09T16:05:00.000-06:00I think I see what Hal's saying now -- as long as ...I think I see what Hal's saying now -- as long as we make naive independence assumptions that lead to our adding log probabilities, our models look linear in some sense. For instance, PCFGs model the log probability of a tree as the sum of the log probabilities of its local trees. HMMs model the log probability of an underlying state sequence and output as a sum of the log probabilities of the transitions and emissions.<BR/><BR/>But as soon as we start asking interesting probabilistic questions, the models no longer look linear. For instance, the probability of a token having a tag in an HMM part-of-speech model requires forward/backward; the probability of a category spanning a text chunk in a PCFG requires inside/outside. Both of these require sums of linear probabilities, not sums of log probabilities. So the overall probability estimates are not linear. <BR/><BR/>As soon as we do the kernel trick (explicitly or implicitly), we're also non-linear in underlying features. For instance, we might model feature F1, feature F2, and then introduce feature F1and2 which is only on if both F1 and F2 are on. A linear model fitting in an enhanced feature space (with conjunctions or generalizations to products if features are more than indicators) would then not be linear in Hal's sense (I'm guessing).<BR/><BR/>One more thing: I've been reading Gelman and Hill's new multi-level regression book (which I'd highly recommend), and was thinking in terms of numerical predictions. In this sense, NLP tends not to model continuous variables, as say, found in the Netflix task.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-33511937066007547852007-05-07T21:32:00.000-06:002007-05-07T21:32:00.000-06:00hello.. My name is wanee. I'm from Malaysia and st...hello.. My name is wanee. I'm from Malaysia and student of university. My group is needed to build the NLP system. So, I want to know about the suitable of language such as java,prolog or any, which suitable language can use to build this system.Please send your suggestion to my email,wanee_snar1803@yahoo.com. Thank you very much.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-57099715992714825552007-05-07T15:54:00.000-06:002007-05-07T15:54:00.000-06:00I'm personally a big fan of linear models, due to ...I'm personally a big fan of linear models, due to its simplicity of implementatation, speed, and ease of use, etc. I think the reason NLPers often have linear models is that our features are often discrete (or even binary), and the simple method of generating a bunch of feature combinations (thereby explictly using polynomial-type kernels) works pretty well. I don't know whether this is the best possible solution for language data, but it certainly is reasonable.Kevin Duhhttps://www.blogger.com/profile/07407894290644783502noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-59026475608010918582007-05-07T15:29:00.000-06:002007-05-07T15:29:00.000-06:00Hrm...maybe I'm thinking of a different sense of "...Hrm...maybe I'm thinking of a different sense of "linear", but all of the examples you list in your second paragraph I would consider linear models (well, not topic models, and perhaps not some social network models). If you think of almost any parser, you get a score for a tree that's a product of a bunch of smaller scores, which are linear (exponential) functions, yielding an overall linear function.<BR/><BR/>I think the only examples when this doesn't hold are really the unsupervised systems... but even then, we typically assume that each latent component yields a linear function (certainly true in MT and speech). So while the overall functional isn't linear, it's a mixture of linear functions.<BR/><BR/>I guess the primary comparison I'm drawing is to vision and robotics problems, where they *never* use linear kernels for anything because they suck terribly, unless you do massive feature engineering (eg., wavelets, etc.).halhttps://www.blogger.com/profile/02162908373916390369noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-47674940470074038382007-05-07T15:20:00.000-06:002007-05-07T15:20:00.000-06:00I question Hal's presupposition that "almost every...I question Hal's presupposition that "almost everything we do in NLP is based on linear classifiers".<BR/><BR/>To the extent that we build taggers, chunkers, parsers, machine translators, speech recognizers, noisy channel spelling correctors, latent topic clusterers, social network analyzers, etc., we're not using linear models, at least in the usual sense. Some of these applications may have linear components, of course. <BR/><BR/>About the only threads of research I can think of that look even partially like linear modeling to the rest of the world would be the log-linear models of max ent and CRFs, simple TF/IDF models in information retrieval, the intriguing joint models of Dan Roth and crew at Illinois, and the use of singular value decomposition (SVD) for linear smoothing. <BR/><BR/>I have more understanding of the shock of the statisticians, which derives from machine learning's wham-bam focus on maximum a posteriori (MAP) models and first-best evaluations (0/1 loss) rather than Bayesian uncertainty estimates and probabilistic evaluation (log likelihood loss). I think this explains the prevalence of discriminitive, non-probabilistic techniques like perceptrons and SVMs, which also happen to be linear.Anonymousnoreply@blogger.com