Comments on natural language processing blog: Those Darn Biologists...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店酒店兼差PRETTY GIRL酒店公關酒...

2009-05-12T10:38:00.000-06:00

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店酒店兼差PRETTY GIRL酒店公關酒店小姐彩色爆米花酒店兼職,酒店工作彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀彩色爆米花

Your blog is great! Very informative! We will be s...

2008-01-10T09:32:00.000-07:00

Your blog is great! Very informative! We will be sure to link to you!

Judy and Harold

Hal,I'd have to agree with you but you have to loo...

2007-12-28T13:26:00.000-07:00

Hal,

I'd have to agree with you but you have to look at it from a different perspective. If you are not in Machine Learning, how can one figure out which of these 25 dimension reduction techniques ( http://www.cs.unimaas.nl/l.vandermaaten/Laurens_van_der_Maaten/Matlab_Toolbox_for_Dimensionality_Reduction.html )
is the best for the type of problems you are most interested in. I can see why using PCA is a safe bet and would convince people in the field and reviewers that the finding to be published is not a by-product of the new algorithm used.

Igor.
http://nuit-blanche.blogspot.com

I agree with Bob Capenter.People have different va...

2007-12-27T21:52:00.000-07:00

I agree with Bob Capenter.
People have different values. I mean they will attach more weightage to a method/technique depending on how the method or technique fits into their overall scheme of things.
If the goal is a research paper then you could try and do a 'mashup' of sorts to increase performance from 99 to 99.5.

In practice I would prefer to use Peter Drucker's value scheme 'It is more important to do the right thing than doing the thing right'.So if this value is used project delivery would get 8 of 10. POS tagging of all varieties will probably get 3.
In one of your earlier blogs you talked about 'confusion matrices'? for POS tagging.
Question is how much marginal value (To the goals) does a
marginal improvement add?

I should also add that in many areas of CS, the sa...

2007-12-27T15:09:00.000-07:00

I should also add that in many areas of CS, the same holds: namely, certain techniques take hold and root themselves, even if newer techniques come along with improvements: the argument again is that the focus is on the problem, rather than the technique, and so you need a dramatic improvement, or a technique snuck in via a new problem.

More generally, this goes to the fact that large parts of ML, and large parts of algorithms for that matter, are driven by techniques rather than problems, whereas in the outside "problem-centric" world, there's a much bigger focus on generalist solutions.

I've been working closely with lab biologists and ...

2007-12-27T14:11:00.000-07:00

I've been working closely with lab biologists and I've started to publish with them. We should distinguish between "methods" papers and "results" papers. Our two main papers so far (one published, the other in print) are on new gene prediction methods. The reviews were pretty much like what I would expected from ACL reviewers, except that they were much more detailed, and required substantial work to answer in full. The first paper was published in a results-oriented journal (PLoS Comp Bio), the other has been accepted by a methods journal (Bioinformatics). Some of our most recent work may lead to specific, experimentally confirmed biological results, which we will submit to the appropriate results journals. Even methods papers may make it into results-oriented journals if the new method is seen by the editor and reviewers as changing the game in a particular area. This has been the case with recent work on discriminative structured gene prediction methods by the Rätsch lab at Max Plank Tübingen and by my group (publications in PLoS Comp Bio), and by the Batzoglou lab at Stanford and the Galagan lab at the Broad institute (publications in Genome Research).

Lab biologists care about solving biology problems...

2007-12-27T12:52:00.001-07:00

Lab biologists care about solving biology problems, not machine learning, algorithms or statistics problems. They'll be hired, get tenure, get grants, get papers published, and win Nobel prizes for innovations in biology.

Time they spend learning about state-of-the-art machine learning and statistics is time they can't spend learning about biology.

Clustering and factorization techniques like PCA are exploratory data analysis tools used to formulate hypotheses. Biologists don't publish much based on data analysis; they have to go back in the lab and verify what they found in the statistics on the bench.

As an academic computational linguist, writing an ACL or NIPS paper is the end product. Parsing a section of the Penn treebank or participating in CoNLL is considered real work.

If, like me, you have to solve real problems for real customers, you'll find yourselves using bigram language models, HMMs, naive Bayes, and other robust techniques even if they're not "state of the art". That's largely because they work well without a lot of feature engineering and run fast in the field in small memory footprints.

This is interesting. It would be a little tricky t...

2007-12-27T12:52:00.000-07:00

This is interesting. It would be a little tricky to get a paper into FOCS/STOC/SODA where the main contribution was to improve some bound on a problem (unless the improvement was substantial, came after a lot of effort, introduced a new technique, etc).

On the other hand, although introducing a new problem can be tricky, lots of new problems get introduced into algorithms conferences, if done the right way. So there's a slightly different sensibility even within CS (to the extent that *ACL is "CS")