Comments on natural language processing blog: Machine learning is the new algorithms

@anon: yup, you're right on both counts :)

2014-10-10T11:21:43.083-06:00

@anon: yup, you're right on both counts :)

hal/anon1 - it will actually take less time (and m...

2014-10-08T10:26:53.714-06:00

hal/anon1 - it will actually take less time (and mental anguish) for Carlos to put in a call to a shortest-path algorithm from a library than for Akiko to hack up some greedy shortest-path algorithm.

learning is about leverage - learn once, use many times. with basic algorithms the leverage is huge, since you encounter the same models/algorithms your whole life.

hal/anon2 - if you go back and re-read your post, you will see that your whole post is written from the point of view of what the students will need in real life. and "teaching to real life" is different than "teaching to the test".

When building a model, one does inevitably have to...

2014-10-08T09:45:11.743-06:00

When building a model, one does inevitably have to think about whether there's an efficient algorithm to do inference. Quite often there may not be one! Applying max flow algorithms to machine learning is actually a success story, e.g. markov random fields:

Boykov et al. "Fast Approximate Energy Minimization via Graph Cuts", IEEE PAMI 2001

@Dan: thanks for the pointer!!! @Anon1: Well one ...

2014-10-08T07:06:05.484-06:00

@Dan: thanks for the pointer!!!

@Anon1: Well one cannot really make an interesting point if the answer is always "more is better." (Which it isn't anyway.) Of course Carlos who has both a good foo and a good bar will likely do better than Akiko or Bob. I'm thinking more from the perspective of the notion of where is it worth spending effort. Even if Carlos exists, it will take him time to do both parts well, and time usually means money, and so my claim is that even if he knows bar, he's better off putting his time/energy/money into foo.

@Anon2: I don't think that was the main point I was trying to make. Yes, getting a job is important. No, I'm not willing to "teach to the test."

Also, what the student "needs" to know i...

2014-10-08T03:07:44.239-06:00

Also, what the student "needs" to know is the wrong way to think about it. Nobody is guaranteed a job at graduation as long as they fulfill some set of minimum requirements.

Students compete for jobs at companies (and for projects inside companies). Students that have a basic course in algorithms are much better equipped to compete.

You seem to constantly fall into false dichotomies...

2014-10-08T03:02:59.346-06:00

You seem to constantly fall into false dichotomies. To quote:

<>

The company that is going to win in real life is the one that implements *both* a really good bar and an optimal foo.

Artificial Intelligence is the good old starting p...

2014-10-07T12:47:46.311-06:00

Artificial Intelligence is the good old starting point of the computer science. Programming language, parallel processing ,networking and many others branched out to meet the needs, but AI, ML, NLP and vision are still going after the goal of "intelligence". It's good to see that we are now well-equipped and converging to the origin again. The AI fields has always been discovering bags of "tricks" or algorithms that efficiently solve some serious IQ test problems on machines that compute. Personally I don't see why the fields that name themselves "algorithm" or "theory" do anything different.

On the other hand do popular ML models go beyond being just algorithms? 0-1 graphical models = max flow min cut, regularization = duality and sparse coding and those submodular stuff are directly from the OR people. They are just algorithms and don't need to be connected with how things really work, which is another class in either NLP, vision, finance engineering or maybe sociology. Did those algorithms really extracted knowledge from data? Yes but all under human supervision.

Ok, it ate my first one. I was going to say that J...

2014-10-07T11:09:47.439-06:00

Ok, it ate my first one. I was going to say that John Hopcroft has been saying similar things for the last year or two. Here's a nice lecture he gave http://www.heidelberg-laureate-forum.org/blog/video/lecture-friday-september-27-john-hopcroft/

And here's a draft book he's working on with Ravi Kannan from MSR (the introduction makes some similar arguments as you made): https://research.microsoft.com/en-US/people/kannan/book-no-solutions-aug-21-2014.pdf

I know this is not the main point here, but I stil...

2014-10-06T10:42:50.602-06:00

I know this is not the main point here, but I still want to update you about flow algorithms.

The currently best approximate algorithm (in theory) is simply combining Frank-Wolfe algorithm and oblivious routing. It used only one first, the maxflow problem is of the form
min ||Ax-b||_inf where A is incidence matrix. If data is noisy, you can add regularize term there and make it robust.
(http://arxiv.org/abs/1304.2338)
(http://arxiv.org/abs/1304.2077)

The best exact algorithm (in theory) is simply using linear programming algorithm. (because it is greatly improved recently.) So, if the data is noisy, again you can add some term to compensate it and the theoretical runtime will not lose there.
(http://arxiv.org/abs/1312.6713)

Both algorithms is within 1 years and hence it is not taught in class yet.

I am just want to point out the (theoretical) fastest algorithm for many combinatorial optimization problems recently have been partially replaced by an convex optimization algorithm + carefully chosen preconditioner. Due to this fact, they are robust in noise and model changes by adding a term into the objective function. So, the algorithm you learnt 15 years is slightly different than the current algorithms and they are still changing, but have not been propagated to other field yet.

So, they can be combined with statistical reasoning easily in an organic way. Therefore, I will still bet some of money on Bob if he knows the currently best algorithm and know what is regularization.

Okay, maybe I shouldn't have made that comment...

2014-10-06T09:36:10.106-06:00

Okay, maybe I shouldn't have made that comment about pulling algorithms out of the undergrad curriculum and replacing it with machine learning. It did indeed create an either/or fallacy that's obviously false.

I think I probably wrote badly and as a result my main point got lost. I'll try to restate it here briefly and then I'll edit the main post.

Main point: I feel like for 15 years, algorithms has been at the heart of most of what computer science does. I feel like that coveted position has now changed to machine learning or, more generically, statistical reasoning. I feel this way because figuring out how to map a real world problem into something an "algorithm" can consume, especially when that needs statistical modeling of various inputs, is (IMO) a more important and harder problem than details about flow algorithms.

Also, I'll note that someone should have called me out for comparing flow (a halfway-through-the-semester topic in an upper division ALG course) with decision trees (probably the first thing you'd do in an ML course). :)

To @Sasho: I really like the way you formulate (most) things in terms of "this is what I think students should be able to do." That's far more useful (IMO) pedagogically than saying "they have to know X." Most of the things you'd like them to be able to do do not, IMO, require the ability to "do algorithms" in the CLR(S) sense.

@Balazs: I disagree for essentially the same reasons I agree with Sasho: I'm much more interested in what students should be able to do than what they "need."

@Suresh: I'll reply on your blog, but essentially an elaboration of what I said above.

@Aaron: I don't think I propose that people shouldn't be able to study the formal properties of algorithms, just that we shouldn't lose sight of the fact that algorithms are an abstraction and most of the work is figuring out how to map your real world problem to something you can solve. And my opinion is that figuring out that mapping is often far more important than anything else, and often involves statistical analysis.

I agree with Sasho, and would add that in my view,...

2014-10-06T05:36:41.746-06:00

I agree with Sasho, and would add that in my view, "Machine Learning" defines a class of problems, and the solutions to those problems are algorithms, some of which can/should be taught in algorithms classes (though others not). Like any other class of problems, you can have solutions which consist of heuristics, or solutions with provable guarantees. An algorithms class teaches you how to reason about the latter, but there is no reason you shouldn't prove a margin-based bound on the convergence rate of the perceptron algorithm, study the convergence properties of gradient descent more generally (or multiplicative weights), and analyze Adaboost.

There is a good case to be made that algorithms classes should be updated to have less focus on classical discrete optimization problems, and to have more focus on convex optimization problems, but I don't see the case at all for eliminating the formal study of algorithms.

Ok you asked for it :) http://geomblog.blogspot.c...

2014-10-06T02:04:01.569-06:00

Ok you asked for it :)

http://geomblog.blogspot.com/2014/10/algorithms-is-new-algorithms.html

Consider the guantlet returned.

Algorithms is basically a math course, I wouldn...

2014-10-06T01:32:11.746-06:00

Algorithms is basically a math course, I wouldn't throw it out. In the same sense that you need discrete math, linear algebra, and probability, you need dynamic programming and greedy algorithms (matroids, submodularity), they give you the foundations of, among others, several learning methods. That said, ML should probably be given in the second year, or even as part of algorithms, for example, by using ML algorithms as examples to instantiate the generic techniques.

This is based on a strange dichotomy. Formulating ...

2014-10-05T21:36:19.588-06:00

This is based on a strange dichotomy. Formulating good models and writing good algorithms are independent (although probably not orthogonal) concerns. You are saying that a good model gives you more mileage than a good algorithm, but this is largely an apples and oranges comparison. You should want your students to be able to reason formally both about statistical models and about algorithms.

I also think the basic algorithms framework for posing and solving computational problems is still useful. Students need to be able to distinguish between a computational problem and an algorithm: k-means clustering is not the same as Lloyd's algorithm, the dictionary problem is not the same as hashing, etc. They also need to be able to tell if they are solving a problem optimally, or using a heuristic, and they need some framework to tell good heuristics from bad heuristics. And, as was pointed out, they need to have a basic understanding of complexity. Yes, maybe your final goal is hard to define, and your data is dirty, but after you decide on a methodology, and do the modeling, you still end up with a series of basic computational problems, and you still need to solve them.

That being said, I don't disagree that the common algorithms curriculum may need some updating to make it more relevant. Covering more heuristics for example might be a good idea. I think you exaggerate how "stable" the field of algorithms is. Maybe it's just the UG algorithms course that has become a little too stable.

@rrenaud: I imagine (based on teaching UG AI and U...

2014-10-04T08:43:38.099-06:00

@rrenaud: I imagine (based on teaching UG AI and UG ML) many people have had that experience.

@alvin: The complexity argument is good, but I guess the question is: could you do this in a more "realistic"/"worthwhile" setting?

I think your experience echoes a lot of what I've seen. Students (in general) love perceptron and hate logistic regression. They love Gibbs sampling and hate EM. I think a lot of this is because the "procedural" things are easy to pretend you understand because at least you can implement them. Once there are layers of abstraction (ok, we'll specify our objective function, set that to the side, and now write a generic optimizer), it gets confusing. IMO the fact that students don't learn to love abstraction in an algorithms class is a huge failure -- that should be exactly what that class is about. Heck it's what all CS classes should be about :).

First, I really like your "fancy algorithms a...

2014-10-03T19:49:33.141-06:00

First, I really like your "fancy algorithms always lose to better models." Before this, it was "fancy algorithms always lose to more data." But, more data without a correct model is just a waste of data collection (time/money/...).

To the "Machine learning is the new algorithms", I agree. There's machine learning everywhere like algorithm was tens years ago where a lot of people were still exciting with linear programming and simplex algorithm. Right now, I see them enjoying deep learning.

The principal concern I'd have is that, withou...

2014-10-03T12:39:47.467-06:00

The principal concern I'd have is that, without a foundation in traditional algorithms, one might not know how to analyze machine learning algorithms or how to formalize the problems to be approximated.

We all know (well, maybe) that closed-form solutions don't typically generalize to messy data, but I tend to view algorithms as an introduction to proofs in the context of computer science and a means of gaining an intuition about complexity.

As someone relatively new to the modeling culture, one of the biggest hurdles for me, I think, was (and perhaps still is) adjusting to thinking of problems purely in terms of the model. In computation, my first instinct is to think of things in terms of for loops and tables, even when we're talking about generative models. This has shifted over the past couple of years, but I guess that my point is that I wonder how much of this is a cultural difference.

Let's be optimistic and say that an undergraduate has taken CS2, Data Structures, and some version of Discrete Math, more-or-less understanding 80% of it. The student steps into Machine Learning and does fine with the probability, but then has to implement an algorithm in a real programming language. The student does something ridiculous because the student has little intuition about the fact that the three nested for loops being called on every input are a terrible idea, or just how much more terrible each new loop makes the complexity of the program. Personally, I found Theory of Computing and Algorithms to be helpful in learning to visualize not only complexity, but things like state spaces and complex traversals.

I think it'd be great ML is were common in undergraduate curricula, but I think that Algorithms should probably stay.

I totally agree with you. As an undergrad, I love...

2014-10-03T12:39:23.909-06:00

I totally agree with you.

As an undergrad, I loved algorithms. It was my favorite class. I did pretty well in programming contests (ACM world finalist) which are basically 80% algorithms and 20% coding.

And then I got to Google, and I was thrown onto a ranking/prediction task, and I flailed very badly at first until my project failed. Though I was working with statisticians (thankfully), I had a lot of trouble even understanding the stats behind really simple models . My lack of preparation for the job was almost humorous.

Later I went to grad school for an MS CS focused on Machine Learning (while working full time) and work on a classification system for local businesses. It was really relevant and useful.