natural language processing blog: A technique for me is a task for you

Originally in the context of Braque and now in the context of FUSE, I've thought a bit about understanding the role of techniques and tasks in scientific papers (admittedly, mostly NLP and ML, which I realize are odd and biased). I worked with Sandeep Pokkunuri, a MS student at Utah, looking at the following problem: given a paper (title, abstract, fulltext), determine what task is being solved and what technique is being used to solve it. For instance, a paper like "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data" the task would be "segmenting and labeling sequence data" and the technique would be "conditional random fields."

You can actually go a long way just looking for simple patterns in paper titles, like "TECH for TASK" or "TASK by TECH" and a few things like that (after doing some NP chunking and clean-up). From there you can get a good list of seed tasks and techniques, and could conceivably bootstrap your way from there. We never got a solid result out of these, and sadly I moved and Sandeep graduated and it never went anywhere. What we wanted to do was automatically generate tables of "for this TASK, here are all the TECHs that have been applied (and maybe here are some results) oh and by the way maybe applying these other TECHs would make sense." Or visa-verse: this TECH has been applied to blah blah blah tasks. You might even be able to tell what TECHs are better for what types of tasks, but that's quite a bit more challenging.

At any rate, a sort of "obvious in retrospect" thing that we noticed was that what I might consider a technique, you might consider a task. And you can construct a chain, typically all the way back to math. For instance, I might consider movie recommendations a task. To solve recommendations, I apply the technique of sparse matrix factorization. But then to you, sparse matrix factorization is a task and to solve it, you apply the technique of compressive sensing. But to Scott Tanner, compressive sensing is a task, and he applies the technique of smoothed analysis (okay this is now false, but you get the idea). But to Daniel Spielman, smoothed analysis is the task, and he applies the technique of some other sort of crazy math. And then eventually you get to set theory (or some might claim you get to category theory, but they're weirdos :P).

(Note: I suspect the same thing happens in other fields, like bio, chem, physics, etc., but I cannot offer such an example because I don't know those areas. Although not so obvious, I do think it holds in math: I use the proof technique of Shelah35 to prove blah -- there, both theorems and proof techniques are objects.)

At first, this was an annoying observation. It meant that our ontology of the world into tasks and techniques was broken. But it did imply something of a richer structure than this simple ontology. For instance, one might posit as a theory of science and technologies studies (STS, a subfield of social science concerned with related things) that the most basic thing that matters is that you have objects (things of study) and an appliedTo relationship. So recommender systems, matrix factorization, compressive sensing, smoothed analysis, set theory, etc., are all objects, and they are linked by appliedTos.

You can then start thinking about what sort of properties appliedTo might have. It's certainly not a function (many things can be applied to any X, and any Y can be applied to many things). I'm pretty sure it should be antireflexive (you cannot apply X to solve X). It should probably also be antisymmetric (if X is applied to Y, probably Y cannot be applied to X). Transitivity is not so obvious, but I think you could argue that it might hold: if I apply gradient descent to an optimization problem, and my particular implementation of gradient descent uses line search, then I kind of am applying line search to my problem, though perhaps not directly. (I'd certainly be interested to hear of counter-examples if any come to mind!)

If this is true, then what we're really talking about is something like a directed acyclic graph, which at least at a first cut seems like a reasonable model for this world. Probably you can find exceptions to almost everything I've said, but that's why you need statistical models or other things that can deal with "noise" (aka model misspecification).

Actually something more like a directed acyclic hypergraph might make sense, since often you simultaneously apply several techniques in tandem to solve a problem. For instance, I apply subgradient descent and L1 regularization to my binary classification problem -- the fact that these two are being applied together rather than separately seems important somehow.

Not that we've gone anywhere with modeling the world like this, but I definitely thing there are some interesting questions buried in this problem.

2 comments:

rrenaud02 October, 2011 20:30
Here is a nice problem where X is used to solve Y, and Y is used to solve X.

Here is a case where LCA is used to solve RMQ, and then restricted RMQ is used to solve LCA.

RMQ -> LCA -> Restricted RMQ

http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor
Anonymous20 October, 2011 00:14
Hi Hal,

Nice post. I've also been looking into how discussion of tasks might be separated from the techniques used to solve them. It hadn't occurred to me that task vs. technique was viewpoint-sensitive.

By the way, I am co-organising a workshop on relating machine learning problems at NIPS this year:

http://rml.cecs.anu.edu.au/

I hope you can make it to the workshop as the discussion session could really benefit from your input given you've been working along these lines.

29 September 2011

A technique for me is a task for you

2 comments: