19 January 2006

Structured Prediction 2: My Perspective

In Structured Prediction 1: What's out there?, I discussed current SP technology: Markov fields, CRFs, M3Ns and SVMISOs. Here I'll discuss my work briefly. I'll come back to SP once more to discuss what seems to be on the horizon.

With so many options, why bother with a new one? There are problems that are not solved by the standard approaches. The biggest problem is the "argmax assumption," which states that you can solve the following problem:

arg max_{y in Y} f(x,y; w) = arg max_{y in Y} w*Phi(x,y)

Where the equality is under the standard linear predictor assumption. This maximization requires that, for an input x, we are able to compute the highest scoring possible output (efficiently).

The problem is that for many (most?) NLP problems, without making completely unreasonable assumptions, this maximization is intractable. The standard assumptions that make this operation polynomial time are the Markov assumption for sequences and the context-free assumption for trees. But we know that both of these assumptions are invalid, and a few papers have tried to deal with this by using sampling, reranking or approximate inference techniques.

On the other hand, if you look at what people in NLP actually do to solve problems, they use approximate search (MT is a good example). In this way, the search space can be tuned to precisely capture the functional (feature) dependencies needed.

My perspective is that if we're going to use search anyway, we should take this into account while learning. There is no point to learning a model that ranks the best output highest if we cannot actually find this output. And once we start using approximate search, any performance guarantees related to learning pretty much go out the window.

The really cool thing is that a completely different brand of machine learning people have considered learning in the context of search, but they call it reinforcement learning. The key difference between SP and RL, as I see it, is that in RL you can affect the world whereas in SP you cannot. In the standard "action/state/observation" terminology, in SP you only get one observation at the very beginning (the input x), while in RL the observations keep coming in as you take more actions. This seems to be the fundamental difference between the two problems.

So, by treating SP in a search framework, what we basically want is a function that takes our input x, a partial output represented by a state in our search space s, and predicts the best action to take: the best thing to add onto the partial output. For instance, in sequence labeling with left-to-right decoding, we might predict the label of the next word. In MT, we might predict a French phrase the translate and its English translation and append this to a translation prefix. The action set is potentially large, but not impossibly so (if it were, the problem would be intractable anyway).

The great thing is that this enables us to formally cast structured prediction as a cost-sensitive multiclass classification problem. Lots of algorithms have been developed for the multiclass problem, and we can seek to apply these to the SP problem through this reduction. This gives us simple, efficient algorithms with good practical performance and strong theoretical guarantees. The key requirement that makes all this work (essentially my version of the arg max assumption) is that for a given input x, true output y and state s in the search space, we can compute the action a that is optimal to execute from s. This is the "optimal policy assumption." (We can actually weaken this assumption to require only a bound on the optimal policy, which is necessary for optimizing things like BLEU or ROUGE, at least under interesting models.) It is provably a strictly weaker assumption than the assumptions made by CRFs and M3Ns, essentially because it doesn't care about the feature space, which means we can have whatever the heck features we want (a good thing for NLP).

2 comments:

Anonymous said...

With the increase of affiliate marketing programs, many affiliate software are being developed these days. The affiliate software that tracks your affiliate business is known as affiliate tracking software. Some manufacturers give you free benefits on buying the software from them.

Tracking software keeps a track of leads, clicks, sales and all the other activities visitors undertake in your website.

The use of this software increases the flow of traffic to your sites. It also helps in increasing sales. You save a lot of time that you can use in other constructive activity for your site.

With the use of this software you will pay only for what your affiliates sell and thus save a lot of your money.

The software must provide you with direct linking capability. This enables you to establish a direct link with all of your affiliates. The software must be able to support multiple sites.

You will require this capability to support all the sites you are working with.

The software does not come freely. You will have to pay some amount to enjoy the benefit of these software. They can be obtained either by paying a monthly rental or an amount of money annually.

The affiliate tracking software tells you where your business is heading. The software provides a high level of system automation that enables you to focus on your creative marketing and other efforts.

The basic purpose of this software is to give you peace of mind. The emphasis should be on simplicity without any hassles. You must be able to sit back and enjoy the benefits of your affiliate business.

If you do not have the right affiliate tracking software then you will not be able to benefit from affiliate marketing benefits.

Work at Home The Most Reliable Work at Home Business On The Web!

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花