16 December 2009

From Kivenen/Warmuth and EG to CW learning and Adaptive Regularization

This post is a bit of a historical retrospective, because it's only been recently that these things have aligned themselves in my head.

The all goes back to Jyrki Kivenen and Manfred Warmuth's paper on exponentiated gradient descent that dates back to STOC 1995. For those who haven't read this paper, or haven't read it recently, it's a great read (although it tends to repeat itself a lot). It's particularly interesting because they derive gradient descent and exponentiated gradient descent (GD and EG) as a consequence of other assumptions.

In particular, suppose we have an online learning problem, where at each time step we receive an example x, make a linear prediction (w'x) and then suffer a loss. The idea is that if we suffer no loss, then we leave w as is; if we do suffer a loss, then we want to balance two goals:

  1. Change w enough so that we wouldn't make this error again
  2. Don't change w too much
The key question is how to define "too much." Suppose that we measure changes in w by looking at Euclidean distance between the updated w and the old w. If we work through the math for enforcing 1 while minimizing 2, we derive the gradient descent update rule that's been used for optimizing, eg., perceptrons for squared loss for ages.

The magic is what happens if we use something other than Euclidean distance. If, instead, we assume that the ws are all positive, we can use an (unnormalized) KL divergence to measure differences between weight vectors. Doing this leads to multiplicative updates, or the exponentiated gradient algorithm.

(Obvious (maybe?) open question: what happens if you replace the distance with some other divergence, say a Bregman, or alpha or phi-divergence?)

This line of thinking leads naturally to Crammer et al.'s work on Online Passive Aggressive algorithms, from JMLR 2006. Here, the idea remains the same, but instead of simply ensuring that we make a correct classification, ala rule (1) above, we ensure that we make a correct classification with a margin of at least 1. They use Euclidean distance to measure the difference in weight vectors, and, for many cases, can get closed-form updates that look GD-like, but not exactly GD. (Aside: what happens if you use, eg., KL instead of Euclidean?)

Two years later, Mark Dredze, Koby Crammer and Fernando Pereira presented Confidence-Weighted Linear Classification. The idea here is the same: don't change the weight vectors too much, but achieve good classification. The insight here is to represent weight vectors by distributions over weight vectors, and the goal is to change these distributions enough, but not too much. Here, we go back to KL, because KL makes more sense for distributions, and make a Gaussian assumption on the weight vector distribution. (This has close connections both to PAC-Bayes and, if I'm wearing my Bayesian hat, Kalman filtering when you make a Gaussian assumption on the posterior, even though it's not really Gaussian... it would be interesting to see how these similarities play out.)

The cool thing here is that you effectively get variable learning rates on different parameters, where confident parameters get moved less. (In practice, one really awesome effect is that you tend to only need one pass over your training data to do a good job!) If you're interested in the Bayesian connection, you can get a very similar style algorithm if you do EP on a Bayesian classification algorithm (by Stern, Herbrich and Graepel), which is what Microsoft Bing uses for online ads.

This finally bring us to NIPS this year, where Koby Crammer, Alex Kulesza and Mark Dredze presented work on Adaptive Regularization of Weight Vectors. Here, they take Confidence Weighted classification and turn the constraints into pieces of the regularizer (somewhat akin to doing a Lagrangian trick). Doing so allows them to derive a representer theorem. But again, the intuition is exactly the same: don't change the classifier too much, but enough.

All in all, this is a very interesting line of work. The reason I'm posting about it is because I think seeing the connections makes it easier to sort these different ideas into bins in my head, depending on what your loss is (squared versus hinge), what your classifier looks like (linear versus distribution over linear) and what your notion of "similar classifiers" is (Euclidean or KL).

(Aside: Tong Zhang has a paper on regularized winnow methods, which fits in here somewhere, but not quite as cleanly.)


Nikos said...

Hi Hal,

Thank you for blogging about the interesting developments in online learning. I may be wrong, but I think the open question you pose has been answered in the last few years.

Take any strictly convex function F mapping a convex set A to the real numbers. Assume F has continuous gradient whose norm goes to infinity as we approach the boundary of A. Then the following algorithm can learn a linear classifier that makes sure the changes in Bregman divergences induced by F are small:

Start with w = grad(F(0))
Predict with sign(w*x)
If a mistake was made update w = grad(F(grad(F*(w))+a*y*x)

where F* is the convex conjugate of F. A good explanation of all these can be found in "Prediction Learning and Games" by Cesa-Bianchi and Lugosi.

P.S. Is there a way to write math in LaTeX in the comments?

अम्बुज said...

Following up on the last comment by Nikos, here's a beautiful paper by Beck and Teboulle that shows how the classic "Mirror Descent" algorithm can be viewed as a projected gradient method where the Euclidean distance had been replaced by a Bregman divergence:


Similar algorithms are known where we use a Csiszar f-divergence instead. A query like "proximal algorithm csiszar" on your favourite search engine is sure to turn up some good reference.

Fernando Pereira said...

Thanks for the nice survey! Koby, Mehryar, and I had a paper at last AIStats where we explored the PAC-bayesian connection in the analysis of a batch CW algorithm.

rr8004 said...

Very nice information. Thanks for this. Please come visit my site City Guide Aurora when you got time.

rr8004 said...

Very nice information. Thanks for this. Please come visit my site Aurora City Business Listings when you got time.

Michigan Lawyer said...

This is such a great resource that you are providing and you give it away for free. I love seeing websites that understand the value of providing a quality resource for free. It’s the old what goes around comes around routine.

Michigan Attorneys, Michigan
, Michigan Law Firms,
Michigan Law Offices, Michigan
Legal Services
, Attorneys
In Michigan
, Michigan Lawyer
, Michigan Attorney
, Michigan Accident Attorneys, Michigan Administrative & Governmental Law Attorneys, Michigan Adoption Attorneys, Michigan Agricultural Law Attorneys, Michigan Appeals Attorneys, Michigan Arbitration & Mediation Services, Michigan Arbitration & Mediation Services Attorneys, Michigan Asbestos Diseases Attorneys, Michigan Asset Protection Attorneys, Michigan Attorneys, Michigan Attorneys&#; Information & Referral Services, Michigan Attorneys&#; Support Services, Michigan Banking & Investment Law Attorneys, Michigan Bankruptcy Attorneys, Michigan Business Services, Michigan Child Abuse Law Attorneys

Family Lawyer Minnesota said...

There are certainly a lot of details like that to take into consideration. That’s a great point to bring up. I offer the thoughts above as general inspiration but clearly there are questions like the one you bring up where the most important thing will be working in honest good faith.

Minnesota Attorneys, Minnesota
, Minnesota Law Firms,
Minnesota Law Offices, Minnesota
Legal Services
, Attorneys
In Minnesota
, Minnesota Lawyer
, Minnesota Attorney
, Minnesota Accident Attorneys, Minnesota Administrative & Governmental Law Attorneys, Minnesota Adoption Attorneys, Minnesota Agricultural Law Attorneys, Minnesota Appeals Attorneys, Minnesota Arbitration & Mediation Services, Minnesota Arbitration & Mediation Services Attorneys, Minnesota Asbestos Diseases Attorneys, Minnesota Asset Protection Attorneys, Minnesota Attorneys, Minnesota Attorneys&#; Information & Referral Services, Minnesota Attorneys&#; Support Services, Minnesota Banking & Investment Law Attorneys, Minnesota Bankruptcy Attorneys, Minnesota Business Services, Minnesota Child Abuse Law Attorneys

Unknown said...

cheap nike shox
cheap sport shoes
nike tn dollar
ed hardy ugg boots
ed hardy love kills slowly
ed hardy clothing us
ed hardy clothing
cheap ed hardy
cheap ed hardy clothing
ed hardy clothes
ed hardy wholesale
ed hardy clothing
ed hardy t shirts
ed hardy shirts
ed hardy uk
ed hardy t shirts
ed hardy shirts
ed hardy hoodies
cheap nike max ,。
puma future cat
ed hardy ugg boots.
ed hardy love kills slowly boots.
ed hardy love kills slowly.
ed hardy polo shirts.
cheap ed hardy clothing,.
ed hardy shirts .
ed hardy t shirts.,.

Unknown said...

There is obviously a lot to know about this. I think you made some good points in Features also.
New Jersey Arbitration & Mediation Services Attorneys, New Jersey Asbestos Diseases Attorneys, New Jersey Asset Protection Attorneys, New Jersey Attorneys, New Jersey Attorneys&#; Information & Referral Services, New Jersey Attorneys&#; Support Services, New Jersey Banking & Investment Law Attorneys, New Jersey Bankruptcy Attorneys, New Jersey Business Services, New Jersey Child Abuse Law Attorneys

Unknown said...

found your site on del.icio.us today and really liked it.. i bookmarked it and will be back to check it out some more later ..
State of New Jersey Lawyer Directory,
New Jersey Attorney Search,
New Jersey Lawyers Search, Find
A New Jersey Attorney Lawyers
, New Jersey Civil Law Attorneys, New Jersey Collection Law Attorneys, New Jersey Computers & Technology Law Attorneys, New Jersey Constitutional Law Attorneys, New Jersey Construction Law Attorneys, New Jersey Consumer Protection Attorneys, New Jersey Domestic Partnerships Attorneys, New Jersey Drug Charges Attorneys, New Jersey DUI/DWI Attorneys,

Anonymous said...

If you had some way of rating posts I would for sure give you a high rating my friend!
South Carolina health insurance, South Dakota health insurance, Tennessee health insurance, Texas health insurance, Utah health insurance, Vermont health insurance, Virginia health insurance, Washington health insurance, West Virginia health insurance, Wisconsin health insurance, Wyoming health insurance

Anonymous said...

Your summaries are always top-notch. Thanks for keeping us apprised. I’m reading every word here.
Your summaries are always top-notch. Thanks for keeping us apprised. I’m reading every word here.

Essay Service said...

Online learning are developing and gaining popularity now a day. Thanks for letting us know how the interesting developments in online leaning. I personally like your post. You have shared good insights and experiences. Keep it up.

Essay Help

combattery84 said...

ACER Travelmate 4002lmi battery
Acer travelmate 800 battery
Acer aspire 3613wlmi battery
Travelmate 2414wlmi battery
Acer batcl50l battery
Acer Travelmate 2300 battery
ACER aspire 3610 battery
ACER travelmate 4600 battery
Dell Latitude D800 battery
Dell Inspiron 600m battery
Dell Inspiron 8100 Battery
Dell Y9943 battery
Dell Inspiron 1521 battery
Dell Inspiron 510m battery
Dell Latitude D500 battery
Dell Latitude D520 battery
Dell GD761 battery
Dell NF343 battery
Dell D5318 battery
Dell G5260 battery
Dell Inspiron 9200 battery
Dell Latitude C500 battery
Dell HD438 Battery
Dell GK479 battery
Dell PC764 battery
Dell KD476 Battery
Dell Inspiron 1150 battery

ylinling001 said...

I like your article, really interesting! My point is also very good, I hope you'll like:chi flat iron are a very popular choice of hair straightener.New Balance,new Blance shoes,new Blance Outlet are some of the most comfortable and stylish shoes on the market today. The designer has a whole range of shoes for all types of athletes. five finger shoes,vibram five fingers,Five fingers shoes give women the feeling of walking barefoot while still keeping the feet protected.