26 January 2010

A machine learner's apology

Andrew Gelman recently announced an upcoming talk by John Lafferty. This reminded me of a post I've been meaning to write for ages (years, really) but haven't gotten around to. Well, now I'm getting around to it.

A colleague from Utah (not in ML) went on a trip and spent some time talking to a computational statistician, who will remain anonymous. But let's call this person Alice. The two were talking about various topics and at one point machine learning came up. Alice commented:
"Machine learning is just non-rigorous computational statistics."
Or something to that effect.

A first reaction is to get defensive: no it's not! But Alice has a point. Some subset of machine learning, in particular the side more Bayesian, tends to overlap quite a bit with compstats, so much so that in some cases they're probably not really that differentiable. (That is to say, there's a subset of ML that's very very similar to a subset of compstats... you could probably fairly easily find some antipoles that are amazingly different.)

And there's a clear intended interpretation to the comment: it's not that we're not rigorous, it's that we're not rigorous in the way that computational statisticians are. To that end, let me offer a glib retort:
Computational statistics is just machine learning where you don't care about the computation.
In much the same way that I think Alice's claim is true, I think this claim is also true. The part of machine learning that's really strong on the CS side, cares a lot about computation: how long, how much space, how many samples, etc., will it take to learn something. This is something that I've rarely seen in compstats, where the big questions really have to do with things like: is this distribution well defined, can I sample from it, etc., now let me run Metropolis-Hastings. (Okay, I'm still being glib.)

I saw a discussion on a theory blog recently that STOC/FOCS is about "THEORY of algorithms" while SODA is about "theory of ALGORITHMS" or something like that. (Given the capitalization, perhaps it was Bill Gasarch :)?) Likewise, I think it's fair to say that classic ML is "MACHINE learning" or "COMPUTATIONAL statistics" and classic compstats is "machine LEARNING" or "computational STATISTICS." We're really working on very similar problems, but the things that we value tend to be different.

Due to that, I've always found it odd that there's not more interaction between compstats and ML. Sure, there's some... but not very much. Maybe it's cultural, maybe it's institutional (conferences versus journals), maybe we really know everything we need to know about the other side and talking wouldn't really get us anywhere. But if it's just a sense of "I don't like you because you're treading on my field," then that's not productive (either direction), even if it is an initial gut reaction.

13 comments:

  1. I think the issue here is largely semantic.

    Was the inventor of L-BFGS a computational statistician, a machine learnologist, or merely a numerical computationalist? What about all the heavy algorithmic lifting involved in sampling (e.g. Casella and Roberts' book)? What about the folks who implemented BUGS? Or something like lmer or even the basic regression in R (or SAS or Stata)? Which field gets to claim the bootstrap or Metropolis-Hastings or Gibbs sampling?

    Are all those convergence theorems in NIPS papers more computational stats than machine learning, given that they're not about algorithms per se?

    PS. I really think one of the big cultural differences is one Lafferty mentioned, saying "the goal of machine learning is to develop computer programs that predict well, according to some measure of risk or accuracy". That seems to make it a branch of decision theory focusing on algorithms.

    Statisticians seem more focused on the analysis of fixed data sets rather than generalization behavior to unseen data. They tend to care more about causality and significance for this reason, because they're trying, in some sense, to use statistics to draw scientific conclusions.

    ReplyDelete
  2. To put Bob's and John's points in summary form: machine learning cares about cost and risk, statistics cares about truth (hypothesis testing). In "The Emergence of Probability," Ian Hacking wrote about the conceptual struggle to separate probability (truth) from expectation (risk/reward). My tongue-in-cheek cynical conclusion is that probability allowed statisticians to save their necks in political perilous times by leaving the prediction and prescription business to economists and other risk-takers.

    ReplyDelete
  3. I totally (mostly) agree with Bob and Fernando -- in fact, that's what I tell students on day one of my machine learning class!

    But I don't think it's the whole story. There are whole parts of machine learning that don't care about predictions (a large subset of unsupervised learning, for instance). And similarly there are parts of statistics that care about predictions (as Bob points out, some parts of decision theory).

    So I think it's a good first approximation, but it's not the whole story.

    ReplyDelete
  4. Although it seems to be unfashionable within some of the more statistically oriented parts of the community today, machine learning used to worry about some larger goals of 'intelligence' - something statisticians never really worried about or had much to say about...

    ReplyDelete
  5. Interesting post! We've been trying to foster just such a collaboration here at UCL in the UK at our "Centre for Computational Statistics and Machine Learning":

    http://www.csml.ucl.ac.uk/

    ReplyDelete
  6. This is very interesting information. I am doing some research for a class in school. and i liked the post. do you know where I can find other information regarding this? I am finding other information on this but nothing that I can use really in my paper for my final. do you have any suggestions?
    This is very interesting information. I am doing some research for a class in school. and i liked the post. do you know where I can find other information regarding this? I am finding other information on this but nothing that I can use really in my paper for my final. do you have any suggestions?

    ReplyDelete
  7. I like your article, really interesting! My point is also very good, I hope you'll like:chi flat iron are a very popular choice of hair straightener.New Balance,new Blance shoes,new Blance Outlet are some of the most comfortable and stylish shoes on the market today. The designer has a whole range of shoes for all types of athletes. five finger shoes,vibram five fingers,Five fingers shoes give women the feeling of walking barefoot while still keeping the feet protected.

    ReplyDelete