01 April 2010

Classification weirdness, regression simplicity

In the context of some work on multitask learning, we came to realize that classification is kind of weird. Or at least linear classification. It's not that it's weird in a way that we didn't already know: it's just sort of a law of unexpected consequences.

If we're doing linear (binary) classification, we all know that changing the magnitude of the weight vector doesn't change the predictions. A standard exercise in a machine learning class might be to show that if your data is linearly separable, then for some models (for instance, unregularized models), the best solution is usually an infinite norm weight vector that's pointing in the right direction.

This is definitely not true of (linear) regression. Taking a good (or even perfect) linear regressor and blowing up the weights by some constant will kill your performance. By adding a regularizer, what you're basically doing is just saying how big you want that norm to be.

Of course, by regression I simply mean minimizing something like squared error and by classification I mean something like 0/1 loss or hinge loss or logistic loss or whatever.

I think this is stuff that we all know.

Where this can bite you in unexpected ways is the following. In lots of problems, like domain adaptation and multitask learning, you end up making assumptions roughly of the form "my weight vector for domain A should look like my weight vector for domain B" where "look like" is really the place where you get to be creative and define things how you feel best.

This is all well and good in the regression setting. A magnitude 5 weight in regression means a magnitude 5 weight in regression. But not so in classification. Since you can arbitrarily scale your weight vectors and still get the same decision boundaries, a magnitude 5 weight kind of means nothing. Or at least it means something that has to do more with the difficulty of the problem and how you chose to set your regularization parameter, rather than something to do with the task itself.

Perhaps we should be looking for definitions of "look like" that are insensitive to things like magnitude. Sure you can always normalize all your weight vectors to unit norm before you co-regularize them, but that loses information as well.

Perhaps this is a partial explanation of some negative transfer. One thing that you see, when looking at the literature in DA and MTL, is that all the tasks are typically of about the same difficulty. My expectation is that if you have two tasks that are highly related, but one is way harder than the other, is going to lead to negative transfer. Why? Because the easy task will get low norm weights, and the hard task will get high norm weights. The high norm weights will pull the low norm weights toward them too much, leading to worse performance on the "easy" task. In a sense, we actually want the opposite to happen: if you have a really hard task, it shouldn't screw up everyone else that's easy! (Yes, I know that being Bayesian might help here since you'd get a lot of uncertainty around those high norm weight vectors!)


  1. I'm a little confused...a classifier can be rescaled and perform the same (measured on 0-1 loss), but when learning, the training criterion may care about the scale (0-1 loss doesn't care, log loss does). So therefore it seems that a magnitude weight 5 does kind of mean something, if you're using something like log loss. What have I missed?

  2. I think I am missing something too. In regression by scaling the weights you change the slope. In classification scaling the weights changes the slope too, but the zero level-set stays the same. How is that if you make it insensitive to such changes you loose information? How does the magnitude of the weight vector relate to "hardness"?

  3. How about just looking at the amount of information provided by a certain feature - as considered by the model? If the model is probabilistic, all is well.

  4. @Chris: It definitely does depend on the loss function; in practice, since we always use convex upper bounds on 0/1 loss, I agree with you that it means something, even if it's not quite clear what it means :).

    @Sam: Hrm... I guess I maybe said more than I'm willing to really defend, but certainly a hard margin SVM, the higher the norm, the smaller the margin, and hence the "harder." That's the intuition I was building on. I still vaguely stand behind it since we often use norm of w as a surrogate for complexity, which is a surrogate for difficulty.

  5. one day i went shopping outside,and in an ed hardy store,I found some kinds of ed hardy i love most they are Your website is really good Thank you for the information ed hardy ed hardy ed hardy clothing ed hardy clothing ed hardy shoes ed hardy shoes don ed hardy don ed hardy ed hardy clothes ed hardy clothes ed hardy bags ed hardy bags ed hardy swimwear ed hardy swimwear ed hardy jeans ed hardy jeans ed hardy mens ed hardy mens Thank you for the information

  6. Fashion is all about lookingmoncler jackets good and being comfortable at the same time. monclermoncler coatsEvery woman loves being complimentedmoncler t-shirtmoncler vestmoncler outlet for her sense of style and elegance. ugg bootscheap ugg bootsdiscount ugg bootsNo woman wants to be labeled as outdated and old fashioned.ugg classic tall bootsclassic ugg bootsWomen are expected to be in constant touch with the latest trends in ordermoncler jacketsmonclernew moncler coats to adopt those trends and look fashionable and stylish. moncler vestmoncler outletmoncler polo t-shirtWomen need to make sure that whatever they wear, coach outletcoach handbagscoach bagsit should compliment their personality and they do not feel awkward while wearing it. coach totescoach outletlouis vuitton handbagsAn article about Gothic costume clothing and the importance of choosing Gothic choker. LV handbags 2010Louis Vuitton bagsLouis Vuitton totesThis article explores the wide variety of Gothicrain boots
    rainweardiesel jeans chokers available from everyday to special occasions. Ture Religion Jeanslevis jeansabercrombie and fitch outletIt should inspire people to go shopping for their own Gothic chokers. ed hardy wholesaleed hardy outletcheap ed hardy wholesaleThe origins of the Flat cap span discount ed hardy wholesalewholesale ed hardy

  7. Fashion trends change on daily basis, like Gold GHD. Following the latest in designer shades has become a passion of everyone, now Burberry Sunglasses. If you are the type of a woman who loves to explore in fashion, our ED Hardy Sunglasses will definitely satisfy your taste. Cheap Ed Hardy Sunglasses is also OK. Ed hardy streak of clothing is expanded into its wholesale ED Hardy T-shirt chain so that a large number of fans and users can enjoy the cheap ED Hardy Clothing range easily with the help of numerous secured websites, actually, our ED Hardy Outlet. As we all know, in fact discount ED Hardy, is based on the creations of the world renowned tattoo artist Don Ed Hardy. Well, this question is bound to strike the minds of all individuals. Many people may say Prada shoes is a joke, but we can give you Prada Sunglasses, because we have Prada handbags. Almost everyone will agree that Prada Purses are some of the most beautiful designer handbags marketed today. Now we have one new product: Prada totes. The reason is simple: fashion prohibited by ugg boots, in other words, we can say it as Cheap ugg boots. Would you like to wear Discount ugg boots. We have two kinds of fashionable boots: classic ugg boots and ugg classic tall boots. Ankh Royalty--the Cultural Revolution. Straightens out the collar, the epaulette epaulet, the Ankh Royalty Clothing two-row buckle. Now welcome to our Ankh Royalty Outlet. And these are different products that bear the most famous names in the world of fashion, like Ankh Royalty T shirt by the way -Prada, Spyder, Moncler(Moncler jackets,or you can say Moncler coats, Moncler T-shirt, Moncler vest,and you can buy them from our discount Moncler outlet), GHD, ED Hardy, Ankh Royalty, Twisted Heart.

  8. Fashion trends change on daily basis, like Gold GHD. Now Burberry Sunglasses.Our ED Hardy Sunglasses will definitely satisfy your taste. ED Hardy T-shirt chain makes people enjoy the cheap ED Hardy Clothing range easily our ED Hardy Outlet. As we all know, in fact discount ED Hardy, is based. The reason is simple: fashion prohibited by polo boots, in other words, we can say it as polo shoes. Would you like to wear cheap ugg boots? Now welcome to our paul smith outlet. And these are different products, like Puma Shoes (or you can say Puma Sneaker, and you can buy them from our puma outlet), GHD, ED Hardy, UGG, Paul Smith and brand Sunglasses. For those who desire for a ugg boots but have to refrain themselves from buying one ugg boots, there are some company that offer discount ugg boots.

  9. Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..

    sesli sohbet
    sesli chat
    sesli sohbet sitesi
    sesli chat sitesi
    sesli sohpet
    kamerali sohbet
    kamerali chat
    webcam sohbet

  10. Young and creative style.
    abercrombie and fitch
    abercrombie & fitch
    You can have a look at it.
    Abercrombie and fitch outlet
    ED Hardy clothing bring you a super surprise!
    ed hardy wholesale clothing
    If you really want it.
    nike outlet