natural language processing blog: A machine learner's apology

26 January 2010

A machine learner's apology

Andrew Gelman recently announced an upcoming talk by John Lafferty. This reminded me of a post I've been meaning to write for ages (years, really) but haven't gotten around to. Well, now I'm getting around to it.

A colleague from Utah (not in ML) went on a trip and spent some time talking to a computational statistician, who will remain anonymous. But let's call this person Alice. The two were talking about various topics and at one point machine learning came up. Alice commented:

"Machine learning is just non-rigorous computational statistics."

Or something to that effect.

A first reaction is to get defensive: no it's not! But Alice has a point. Some subset of machine learning, in particular the side more Bayesian, tends to overlap quite a bit with compstats, so much so that in some cases they're probably not really that differentiable. (That is to say, there's a subset of ML that's very very similar to a subset of compstats... you could probably fairly easily find some antipoles that are amazingly different.)

And there's a clear intended interpretation to the comment: it's not that we're not rigorous, it's that we're not rigorous in the way that computational statisticians are. To that end, let me offer a glib retort:

Computational statistics is just machine learning where you don't care about the computation.

In much the same way that I think Alice's claim is true, I think this claim is also true. The part of machine learning that's really strong on the CS side, cares a lot about computation: how long, how much space, how many samples, etc., will it take to learn something. This is something that I've rarely seen in compstats, where the big questions really have to do with things like: is this distribution well defined, can I sample from it, etc., now let me run Metropolis-Hastings. (Okay, I'm still being glib.)

I saw a discussion on a theory blog recently that STOC/FOCS is about "THEORY of algorithms" while SODA is about "theory of ALGORITHMS" or something like that. (Given the capitalization, perhaps it was Bill Gasarch :)?) Likewise, I think it's fair to say that classic ML is "MACHINE learning" or "COMPUTATIONAL statistics" and classic compstats is "machine LEARNING" or "computational STATISTICS." We're really working on very similar problems, but the things that we value tend to be different.

Due to that, I've always found it odd that there's not more interaction between compstats and ML. Sure, there's some... but not very much. Maybe it's cultural, maybe it's institutional (conferences versus journals), maybe we really know everything we need to know about the other side and talking wouldn't really get us anywhere. But if it's just a sense of "I don't like you because you're treading on my field," then that's not productive (either direction), even if it is an initial gut reaction.

14 comments:

Bob Carpenter said...: I think the issue here is largely semantic.

Was the inventor of L-BFGS a computational statistician, a machine learnologist, or merely a numerical computationalist? What about all the heavy algorithmic lifting involved in sampling (e.g. Casella and Roberts' book)? What about the folks who implemented BUGS? Or something like lmer or even the basic regression in R (or SAS or Stata)? Which field gets to claim the bootstrap or Metropolis-Hastings or Gibbs sampling?

Are all those convergence theorems in NIPS papers more computational stats than machine learning, given that they're not about algorithms per se?

PS. I really think one of the big cultural differences is one Lafferty mentioned, saying "the goal of machine learning is to develop computer programs that predict well, according to some measure of risk or accuracy". That seems to make it a branch of decision theory focusing on algorithms.

Statisticians seem more focused on the analysis of fixed data sets rather than generalization behavior to unseen data. They tend to care more about causality and significance for this reason, because they're trying, in some sense, to use statistics to draw scientific conclusions.; 26 January, 2010 12:58
Fernando Pereira said...: To put Bob's and John's points in summary form: machine learning cares about cost and risk, statistics cares about truth (hypothesis testing). In "The Emergence of Probability," Ian Hacking wrote about the conceptual struggle to separate probability (truth) from expectation (risk/reward). My tongue-in-cheek cynical conclusion is that probability allowed statisticians to save their necks in political perilous times by leaving the prediction and prescription business to economists and other risk-takers.; 27 January, 2010 04:42
hal said...: I totally (mostly) agree with Bob and Fernando -- in fact, that's what I tell students on day one of my machine learning class!

But I don't think it's the whole story. There are whole parts of machine learning that don't care about predictions (a large subset of unsupervised learning, for instance). And similarly there are parts of statistics that care about predictions (as Bob points out, some parts of decision theory).

So I think it's a good first approximation, but it's not the whole story.; 27 January, 2010 08:31
Anonymous said...: Although it seems to be unfashionable within some of the more statistically oriented parts of the community today, machine learning used to worry about some larger goals of 'intelligence' - something statisticians never really worried about or had much to say about...; 28 January, 2010 04:56
Lawyer Nevada said...: Your article was quite intriguing and the information quite useful. Will check your site often to see other great posts you make! Regards

Nevada Attorneys, Nevada
Lawyers, Nevada Law Firms,
Nevada Law Offices, Nevada
Legal Services, Attorneys
In Nevada, Nevada Lawyer
Directory, Nevada Attorney
Directory, Nevada Accident Attorneys, Nevada Administrative & Governmental Law Attorneys, Nevada Adoption Attorneys, Nevada Agricultural Law Attorneys, Nevada Appeals Attorneys, Nevada Arbitration & Mediation Services, Nevada Arbitration & Mediation Services Attorneys, Nevada Asbestos Diseases Attorneys, Nevada Asset Protection Attorneys, Nevada Attorneys, Nevada Attorneys&#; Information & Referral Services, Nevada Attorneys&#; Support Services, Nevada Banking & Investment Law Attorneys, Nevada Bankruptcy Attorneys, Nevada Business Services, Nevada Child Abuse Law Attorneys; 29 January, 2010 02:26
Attorney New Mexico said...: As a Newbie, I am always searching online for articles that can help me. Thank you

State of New Mexico Lawyer Directory,
New Mexico Attorney Search,
New Mexico Lawyers Search, Find
A New Mexico Attorney Lawyers, New Mexico Civil Law Attorneys, New Mexico Collection Law Attorneys, New Mexico Computers & Technology Law Attorneys, New Mexico Constitutional Law Attorneys, New Mexico Construction Law Attorneys, New Mexico Consumer Protection Attorneys, New Mexico Domestic Partnerships Attorneys, New Mexico Drug Charges Attorneys, New Mexico DUI/DWI Attorneys, New Mexico Education Law Attorneys, New Mexico Elder Law Attorneys, New Mexico Land Use & Zoning Attorneys, New Mexico Landlord & Tenant Law Attorneys, New Mexico Legal Information Services, New Mexico Legal Services, New Mexico Trial Attorneys, New Mexico Vehicular Accident Attorneys, New Mexico Whistleblower Attorneys, New Mexico Workers&#; Compensation Attorneys, New Mexico Wrongful Death Attorneys, New Mexico Wrongful Termination Attorneys; 29 January, 2010 02:28
Unknown said...: Interesting post! We've been trying to foster just such a collaboration here at UCL in the UK at our "Centre for Computational Statistics and Machine Learning":

http://www.csml.ucl.ac.uk/; 09 March, 2010 10:35
Unknown said...: cheap nike shox
cheap sport shoes
nike tn dollar
ed hardy ugg boots
ed hardy love kills slowly
ed hardy clothing us
ed hardy clothing
cheap ed hardy
cheap ed hardy clothing
ed hardy clothes
ed hardy wholesale
ed hardy clothing
ed hardy t shirts
ed hardy shirts
ed hardy uk
ed hardy t shirts
ed hardy shirts
ed hardy hoodies
Cheap JORDAN SHOES，，
cheap nike max ，。
puma future cat
ed hardy ugg boots.
ed hardy love kills slowly boots.
ed hardy love kills slowly.
ed hardy polo shirts.
cheap ed hardy clothing,.
ed hardy shirts .
ed hardy t shirts.,.,.; 13 April, 2010 20:20
Unknown said...: Good writing. Keep up the good work. I just added your RSS feed my Google News Reader..
New Jersey Attorneys Legal Services
, Attorney Directory New Jersey,
State Of New Jersey Lawyers,
Lawyers New Jersey, Attorneys
Of New Jersey,

New Jersey Attorney Lawyer, New Jersey Lemon Law Attorneys, New Jersey Malpractice & Negligence Attorneys, New Jersey Maritime & Admiralty Law Attorneys, New Jersey Media & Communications Law Attorneys; 11 May, 2010 03:11
Unknown said...: Well Whattadya know, yet another great site to add to my reader! Google blog search has you pretty well indexed it seems! you have some brilliant contents!
New Jersey Medical Malpractice Attorneys, New Jersey Military & Veterans Law Attorneys, New Jersey Native Persons Law Attorneys, New Jersey Non-Attorney Court Agents, New Jersey Nonprofit Organizations Attorneys, New Jersey Patent & Trademark Attorneys, New Jersey Personal Injury Attorneys,
New Jersey Product Liability Law Attorneys, New Jersey Property Law Attorneys, New Jersey Real Estate Attorneys; 11 May, 2010 03:11
Anonymous said...: Hey very nice blog!! Man .. Beautiful .. Amazing .. I will bookmark your blog and take the feeds also…
Alabama motorcycle insurance, Alaska motorcycle insurance, Arizona motorcycle insurance, Arkansas motorcycle insurance, Bail Bonds Referral Services motorcycle insurance, California motorcycle insurance, Colorado motorcycle insurance, Connecticut motorcycle insurance, Delaware motorcycle insurance, District Of Colombia motorcycle insurance, Florida motorcycle insurance,; 14 May, 2010 00:12
Anonymous said...: This is very interesting information. I am doing some research for a class in school. and i liked the post. do you know where I can find other information regarding this? I am finding other information on this but nothing that I can use really in my paper for my final. do you have any suggestions?
This is very interesting information. I am doing some research for a class in school. and i liked the post. do you know where I can find other information regarding this? I am finding other information on this but nothing that I can use really in my paper for my final. do you have any suggestions?; 14 May, 2010 00:12
Unknown said...: ed hardy
ed hardy clothing
discount ed hardy
ed hardy discount
cheap ed hardy
wholesale ed hardy
buy ed hardy
don ed hardy

1.

Ed Hardy Men Accessories
ed hardy man wristbands
ed hardy men bags
ed hardy men wallets
ed hardy men belts
ed hardy men caps
ed hardy men jewelry
ed hardy men socks
ed hardy men sunglasses
ed hardy men ties

2.

ed hardy men watches
ed hardy scarves
ed hardy gloves
Ed Hardy Men Apparel
ed hardy men board shorts
ed hardy men demin
ed hardy men hoodies
ed hardy men outerwear
ed hardy men polos
ed hardy men sweater
ed hardy men knits
ed hardy men t-shirts
ed hardy underwear loungewear; 18 June, 2010 03:36
ylinling001 said...: I like your article, really interesting! My point is also very good, I hope you'll like:chi flat iron are a very popular choice of hair straightener.New Balance,new Blance shoes,new Blance Outlet are some of the most comfortable and stylish shoes on the market today. The designer has a whole range of shoes for all types of athletes. five finger shoes,vibram five fingers,Five fingers shoes give women the feeling of walking barefoot while still keeping the feet protected.; 14 July, 2010 02:29

Subscribe to: Post Comments (Atom)