03 October 2014

Machine learning is the new algorithms

When I was an undergrad, probably my favorite CS class I took was algorithms. I liked it (a) because my background was math so it was the closest match to what I knew and (b) because even though it was "theory," a lot of the stuff we learned was really relevant. Over time, it seemed like the area had distilled worthwhile algorithms from interesting-in-theory-but-you'll-never-actually use algorithms.

In fact, I think this is a large part of why most undergraduate CS degrees today require a course in algorithms. You have these very nice, clearly defined statements, and very elegant solutions to those statements that in most cases (at the UG level) are known to be optimal.

Fast forward N years.

My claim today---and I'm speaking really as an NLP person, which is how I self-identify---is that machine learning is the new core. Everything that algorithms was to computer science 15 years ago, machine learning is today. That's not to say it won't move in another 10 years, but that's how I see it.

Why?

For the most part, algorithms (especially as taught at th UG level) is the study of one thing: Given a perfect input, how do I most efficiently compute the optimal output.

The problem is the "perfect input" part.

All of my experience in the past N years has told me that you never have a perfect input, and that it's far far far more important to be able to synthesize information from a large number of sources and reason about it than it is to find the exact-right-solution to some problem that exists only to Plato.

Even within machine learning you see this effect. Lots of numerical analysis people have worked on good algorithms for getting that last little bit of precision out of optimization algorithms. Does it matter? Nope! Model specification, parameter tuning, features, and data matter infinitely more than that last little bit of precision. (In some fields, for instance, scientific computing, that last little bit of precision may matter. I don't know enough to know one way or the other.)

Let's play a thought game. Say you're an UG CS major. You graduate and get a job in CS (not grad school). Which are you more likely to use: (1) a weighted cost flow algorithm or (2) a perceptron/decision tree?

Clearly I think the answer is (2). And I loved flow algorithms when I was an undergrad and have actually spent since 2006 trying to figure out how I can use them for a problem I want to solve. No dice.

I would actually go further. Suppose you have a problem whose inputs are ill-specified (as they always are when dealing with data), and whose structure actually does look like a flow problem. There are two CS students trying to solve this problem. Akiko knows about machine learning but not flows; Bob knows about flows but not machine learning. Bob tries to massage his data by hand into the input to an optimal flow algorithm, and then solves it exactly. Akiko uses machine learning to get good edge weights and hacks together some greedy algorithm for flows, not even knowing it's called a flow. Who's solution works better? I would put almost any amount of money on Akiko.

Full disclosure: those who know about my research in structured prediction will recognize this as a recurring theme in my own research agenda: fancy algorithms always lose to better models.

There's another big difference between N years ago and today: almost every algorithm you could possibly care about (or learn about as an UG) is implemented in a library for any reasonable programming language. That's not to say that it's unimportant to know how things work in order to use them, but I would argue it's much less important in a field like algorithms whose knowledge is comparatively stable, versus a field like machine learning where things are still changing and there is no "one right answer" to the "machine learning problem." In a field that's still a bit of an art rather than a science, understanding how things work under the hood feels a lot more important. Quicksort, heaps, minimum spanning trees, ... these are all here to stay.
Okay, so now I've convinced myself that we should yank algorithms out as an UG requirement and replace it with machine learning.

But wait, I can hear my colleagues yelling, taking algorithms isn't about learning algorithms: it's about learning how to think! But that's also what I think is great about machine learning: the distance between theory and algorithms is actually usually quite small (I try to get this across at various points in CiML, to varying degrees of success). If the only point of an algorithms class (I've heard exactly this argument made about automata theory, for instance) is to teach students how to think, I think we could do much better.

Okay, so I've thrown down the gauntlet. Someone should come smack me with theirs :P!

Edit after some comments:

I think I probably wrote badly and as a result my main point got lost. I'll try to restate it here briefly and then I'll edit the main post.

Main point: I feel like for 15 years, algorithms has been at the heart of most of what computer science does. I feel like that coveted position has now changed to machine learning or, more generically, statistical reasoning. I feel this way because figuring out how to map a real world problem into something an "algorithm" can consume, especially when that needs statistical modeling of various inputs, is (IMO) a more important and harder problem than details about flow algorithms. 



EDIT #2 AFTER DISCUSSION ON SURESH'S BLOG:

let me give a concrete example that may actually be a real world example, but i don't know (though see this paper). that of path finding for taxis or cars. the world is a graph and given directed edge costs we can run dijkstra or whatever to find LEAST-TIME (shortest) paths. this is basically google maps/etc.

of course, we never know the true time to travel some segment. we might know it now, but by the time the driver gets to some road (5 or 10 minutes from now) the conditions may have changed. and of course we have historical data on traffic from which we can predict what the condition of the road will be like in 10 minutes.

so here, "foo" is a function that takes the time of data, historical traffic data, weather and whathaveyou, and maps it to edge costs.

"bar" is dijkstra's algorithm or whatever shortest path algorithm you like.

my claim is that if you really want to solve this problem, it's much more important to understand how to create foo than how to create bar. in particular, if i gave you a greedy or near greedy approach to bar, combined with a really good foo, i bet this would be significantly better than an optimal bar and a crappy foo.

18 comments:

Unknown said...

I totally agree with you.

As an undergrad, I loved algorithms. It was my favorite class. I did pretty well in programming contests (ACM world finalist) which are basically 80% algorithms and 20% coding.

And then I got to Google, and I was thrown onto a ranking/prediction task, and I flailed very badly at first until my project failed. Though I was working with statisticians (thankfully), I had a lot of trouble even understanding the stats behind really simple models . My lack of preparation for the job was almost humorous.

Later I went to grad school for an MS CS focused on Machine Learning (while working full time) and work on a classification system for local businesses. It was really relevant and useful.

Alvin Grissom II said...

The principal concern I'd have is that, without a foundation in traditional algorithms, one might not know how to analyze machine learning algorithms or how to formalize the problems to be approximated.

We all know (well, maybe) that closed-form solutions don't typically generalize to messy data, but I tend to view algorithms as an introduction to proofs in the context of computer science and a means of gaining an intuition about complexity.

As someone relatively new to the modeling culture, one of the biggest hurdles for me, I think, was (and perhaps still is) adjusting to thinking of problems purely in terms of the model. In computation, my first instinct is to think of things in terms of for loops and tables, even when we're talking about generative models. This has shifted over the past couple of years, but I guess that my point is that I wonder how much of this is a cultural difference.

Let's be optimistic and say that an undergraduate has taken CS2, Data Structures, and some version of Discrete Math, more-or-less understanding 80% of it. The student steps into Machine Learning and does fine with the probability, but then has to implement an algorithm in a real programming language. The student does something ridiculous because the student has little intuition about the fact that the three nested for loops being called on every input are a terrible idea, or just how much more terrible each new loop makes the complexity of the program. Personally, I found Theory of Computing and Algorithms to be helpful in learning to visualize not only complexity, but things like state spaces and complex traversals.

I think it'd be great ML is were common in undergraduate curricula, but I think that Algorithms should probably stay.

Anonymous said...

First, I really like your "fancy algorithms always lose to better models." Before this, it was "fancy algorithms always lose to more data." But, more data without a correct model is just a waste of data collection (time/money/...).

To the "Machine learning is the new algorithms", I agree. There's machine learning everywhere like algorithm was tens years ago where a lot of people were still exciting with linear programming and simplex algorithm. Right now, I see them enjoying deep learning.

hal said...

@rrenaud: I imagine (based on teaching UG AI and UG ML) many people have had that experience.

@alvin: The complexity argument is good, but I guess the question is: could you do this in a more "realistic"/"worthwhile" setting?

I think your experience echoes a lot of what I've seen. Students (in general) love perceptron and hate logistic regression. They love Gibbs sampling and hate EM. I think a lot of this is because the "procedural" things are easy to pretend you understand because at least you can implement them. Once there are layers of abstraction (ok, we'll specify our objective function, set that to the side, and now write a generic optimizer), it gets confusing. IMO the fact that students don't learn to love abstraction in an algorithms class is a huge failure -- that should be exactly what that class is about. Heck it's what all CS classes should be about :).

Sasho said...

This is based on a strange dichotomy. Formulating good models and writing good algorithms are independent (although probably not orthogonal) concerns. You are saying that a good model gives you more mileage than a good algorithm, but this is largely an apples and oranges comparison. You should want your students to be able to reason formally both about statistical models and about algorithms.

I also think the basic algorithms framework for posing and solving computational problems is still useful. Students need to be able to distinguish between a computational problem and an algorithm: k-means clustering is not the same as Lloyd's algorithm, the dictionary problem is not the same as hashing, etc. They also need to be able to tell if they are solving a problem optimally, or using a heuristic, and they need some framework to tell good heuristics from bad heuristics. And, as was pointed out, they need to have a basic understanding of complexity. Yes, maybe your final goal is hard to define, and your data is dirty, but after you decide on a methodology, and do the modeling, you still end up with a series of basic computational problems, and you still need to solve them.

That being said, I don't disagree that the common algorithms curriculum may need some updating to make it more relevant. Covering more heuristics for example might be a good idea. I think you exaggerate how "stable" the field of algorithms is. Maybe it's just the UG algorithms course that has become a little too stable.

Balázs said...

Algorithms is basically a math course, I wouldn't throw it out. In the same sense that you need discrete math, linear algebra, and probability, you need dynamic programming and greedy algorithms (matroids, submodularity), they give you the foundations of, among others, several learning methods. That said, ML should probably be given in the second year, or even as part of algorithms, for example, by using ML algorithms as examples to instantiate the generic techniques.

Suresh Venkatasubramanian said...

Ok you asked for it :)

http://geomblog.blogspot.com/2014/10/algorithms-is-new-algorithms.html

Consider the guantlet returned.

Aaron said...

I agree with Sasho, and would add that in my view, "Machine Learning" defines a class of problems, and the solutions to those problems are algorithms, some of which can/should be taught in algorithms classes (though others not). Like any other class of problems, you can have solutions which consist of heuristics, or solutions with provable guarantees. An algorithms class teaches you how to reason about the latter, but there is no reason you shouldn't prove a margin-based bound on the convergence rate of the perceptron algorithm, study the convergence properties of gradient descent more generally (or multiplicative weights), and analyze Adaboost.

There is a good case to be made that algorithms classes should be updated to have less focus on classical discrete optimization problems, and to have more focus on convex optimization problems, but I don't see the case at all for eliminating the formal study of algorithms.

hal said...

Okay, maybe I shouldn't have made that comment about pulling algorithms out of the undergrad curriculum and replacing it with machine learning. It did indeed create an either/or fallacy that's obviously false.

I think I probably wrote badly and as a result my main point got lost. I'll try to restate it here briefly and then I'll edit the main post.

Main point: I feel like for 15 years, algorithms has been at the heart of most of what computer science does. I feel like that coveted position has now changed to machine learning or, more generically, statistical reasoning. I feel this way because figuring out how to map a real world problem into something an "algorithm" can consume, especially when that needs statistical modeling of various inputs, is (IMO) a more important and harder problem than details about flow algorithms.

Also, I'll note that someone should have called me out for comparing flow (a halfway-through-the-semester topic in an upper division ALG course) with decision trees (probably the first thing you'd do in an ML course). :)

To @Sasho: I really like the way you formulate (most) things in terms of "this is what I think students should be able to do." That's far more useful (IMO) pedagogically than saying "they have to know X." Most of the things you'd like them to be able to do do not, IMO, require the ability to "do algorithms" in the CLR(S) sense.

@Balazs: I disagree for essentially the same reasons I agree with Sasho: I'm much more interested in what students should be able to do than what they "need."

@Suresh: I'll reply on your blog, but essentially an elaboration of what I said above.

@Aaron: I don't think I propose that people shouldn't be able to study the formal properties of algorithms, just that we shouldn't lose sight of the fact that algorithms are an abstraction and most of the work is figuring out how to map your real world problem to something you can solve. And my opinion is that figuring out that mapping is often far more important than anything else, and often involves statistical analysis.

Yin-Tat Lee said...

I know this is not the main point here, but I still want to update you about flow algorithms.

The currently best approximate algorithm (in theory) is simply combining Frank-Wolfe algorithm and oblivious routing. It used only one first, the maxflow problem is of the form
min ||Ax-b||_inf where A is incidence matrix. If data is noisy, you can add regularize term there and make it robust.
(http://arxiv.org/abs/1304.2338)
(http://arxiv.org/abs/1304.2077)

The best exact algorithm (in theory) is simply using linear programming algorithm. (because it is greatly improved recently.) So, if the data is noisy, again you can add some term to compensate it and the theoretical runtime will not lose there.
(http://arxiv.org/abs/1312.6713)

Both algorithms is within 1 years and hence it is not taught in class yet.

I am just want to point out the (theoretical) fastest algorithm for many combinatorial optimization problems recently have been partially replaced by an convex optimization algorithm + carefully chosen preconditioner. Due to this fact, they are robust in noise and model changes by adding a term into the objective function. So, the algorithm you learnt 15 years is slightly different than the current algorithms and they are still changing, but have not been propagated to other field yet.

So, they can be combined with statistical reasoning easily in an organic way. Therefore, I will still bet some of money on Bob if he knows the currently best algorithm and know what is regularization.

Dan said...

Ok, it ate my first one. I was going to say that John Hopcroft has been saying similar things for the last year or two. Here's a nice lecture he gave http://www.heidelberg-laureate-forum.org/blog/video/lecture-friday-september-27-john-hopcroft/

And here's a draft book he's working on with Ravi Kannan from MSR (the introduction makes some similar arguments as you made): https://research.microsoft.com/en-US/people/kannan/book-no-solutions-aug-21-2014.pdf

GRRR said...

Artificial Intelligence is the good old starting point of the computer science. Programming language, parallel processing ,networking and many others branched out to meet the needs, but AI, ML, NLP and vision are still going after the goal of "intelligence". It's good to see that we are now well-equipped and converging to the origin again. The AI fields has always been discovering bags of "tricks" or algorithms that efficiently solve some serious IQ test problems on machines that compute. Personally I don't see why the fields that name themselves "algorithm" or "theory" do anything different.

On the other hand do popular ML models go beyond being just algorithms? 0-1 graphical models = max flow min cut, regularization = duality and sparse coding and those submodular stuff are directly from the OR people. They are just algorithms and don't need to be connected with how things really work, which is another class in either NLP, vision, finance engineering or maybe sociology. Did those algorithms really extracted knowledge from data? Yes but all under human supervision.

Anonymous said...

You seem to constantly fall into false dichotomies. To quote:

<>

The company that is going to win in real life is the one that implements *both* a really good bar and an optimal foo.

Anonymous said...

Also, what the student "needs" to know is the wrong way to think about it. Nobody is guaranteed a job at graduation as long as they fulfill some set of minimum requirements.

Students compete for jobs at companies (and for projects inside companies). Students that have a basic course in algorithms are much better equipped to compete.

hal said...

@Dan: thanks for the pointer!!!

@Anon1: Well one cannot really make an interesting point if the answer is always "more is better." (Which it isn't anyway.) Of course Carlos who has both a good foo and a good bar will likely do better than Akiko or Bob. I'm thinking more from the perspective of the notion of where is it worth spending effort. Even if Carlos exists, it will take him time to do both parts well, and time usually means money, and so my claim is that even if he knows bar, he's better off putting his time/energy/money into foo.

@Anon2: I don't think that was the main point I was trying to make. Yes, getting a job is important. No, I'm not willing to "teach to the test."

cafebeen said...

When building a model, one does inevitably have to think about whether there's an efficient algorithm to do inference. Quite often there may not be one! Applying max flow algorithms to machine learning is actually a success story, e.g. markov random fields:

Boykov et al. "Fast Approximate Energy Minimization via Graph Cuts", IEEE PAMI 2001

Anonymous said...

hal/anon1 - it will actually take less time (and mental anguish) for Carlos to put in a call to a shortest-path algorithm from a library than for Akiko to hack up some greedy shortest-path algorithm.

learning is about leverage - learn once, use many times. with basic algorithms the leverage is huge, since you encounter the same models/algorithms your whole life.

hal/anon2 - if you go back and re-read your post, you will see that your whole post is written from the point of view of what the students will need in real life. and "teaching to real life" is different than "teaching to the test".

hal said...

@anon: yup, you're right on both counts :)