natural language processing blog: Beating the state of the art, big-O style

04 February 2009

Beating the state of the art, big-O style

I used to know a fair amount about math; in fact, I even applied to a few grad schools to do logic (long story, now is not the time). I was never involved in actual math research (primarily because of the necessary ramp-up time -- thank goodness this doesn't exist as much in CS!), but I did get a sense from my unofficial undergrad advisor how things worked. The reason I started thinking of this is because I recently made my way about halfway through a quite dense book on long games by a prof from grad school (I cross-enrolled at UCLA for a few semesters). The basic idea of a countable game is that there is a fixed subset A of [0,1] (subset of the reals) and two players alternate play. In each move, they play an integer from 0 to 9. They play for a countably infinite number of moves, essentially writing down (in decimal form) a real number. If, at the "end", this number is in A, then player 1 wins; otherwise, player 2 wins. Both players know A. The set A is said to be determined if there is a strategy that will force a win; it is undetermined otherwise. A long game is the obvious generalization where you play for longer than countable time. The details don't matter.

This led me to think, as someone who's moved over to working a lot on machine learning: is there an analogous question for online learning? There are several ways to set this up and I won't bore you with the details (I doubt any reader here really cares), but depending on how you set it up, you can prove several relatively trivial, but kind of cute, results (I say trivial because they took me on the order of hours, which means that someone who knows what they're doing probably would see them immediately). I basically did this as a mental exercise, not for any real reason.

But it got me thinking: obviously machine learning people wouldn't care about this because it's too esoteric and not at all a realistic setting (even for COLTers!). I strongly doubt that logicians would care either, but for a totally different reason. From my interaction, they would be interested if and only if two things were satisfied: (a) the result showed some interesting connection between a new model and existing models; (b) the proofs were non-trivial and required some new insight that could be then applied to other problems. Obviously this is not my goal in life, so I've dropped it.

This led to me introspect: what is is that we as a community need in order to find some result interesting? What about other fields that I claim to know a bit about?

Let's take algorithms for a minute. Everything here is about big-O. Like the math types, a result without an interesting proof is much less interesting than a result with an interesting proof, though if you start reading CS theory blogs, you'll find that there's a bit of divide in the community on whether this is good or not. But my sense (which could be totally broken) is that if you have a result with a relatively uninteresting proof that gets you the same big-O running time as the current state of the art, you're in trouble.

I think it's interesting to contrast this with what happens in both NLP and ML. Big-O works a bit differently here. My non-technical description of big-O to someone who knows nothing is that it measure "order of magnitude" improvements. (Okay, O(n log n) versus O(n log log n) is hard to call an order of magnitude, but you get the idea.) An equivalent on the experimental side would seem to be something like: you cut the remaining error on a problem by half or more. In other words, if state of the art is 60% accuracy, then an order of magnitude improvement would be 80% accuracy or better. At 80% it would be 90%. At 90% it would be 95% and so on. 90.5% to 91.2% is not order of magnitude.

I actually like this model for looking at experimental results. Note that this has absolutely nothing to do with statistical significance. It's kind of like reading results graphs with a pair of thick glasses on (for those who don't wear glasses) or no glasses on (for those who wear think glasses). I think the justification is that for less than order of magnitude improvement, it's really just hard to say whether the improvement is due to better tweaking or engineering or dumb luck in how some feature was implemented or what. For order of magnitude improvement, there almost has to be something interesting going on.

Now, I'm not proposing that a paper isn't publishable if it doesn't have order of magnitude improvement. Very few papers would be published this way. I'm just suggesting that improving the state of the art not be -- by itself -- a reason for acceptance unless it's an order of magnitude improvement. That is, you'd either better have a cool idea, be solving a new problem, analyzing the effect of some aspect of a problem that's important, etc., or work on a well trod task and get a big-O improvement.

What I'm saying isn't novel, of course... the various exec boards at the ACL conferences have been trying to find ways to get more "interesting" papers into the conferences for (at least) a few years. This is just a concrete proposal. Obviously it requires buy-in at least of area chairs and probably reviewers. And there are definitely issues with it. Like any attempt to make reviewing non-subjective, there are obviously corner cases (eg., you have a sort-of-interesting idea and an almost-order-of-magnitude improvement). You can't mechanize the reviewing process. But frankly, when I see paper reviews that gush over tiny improvements in the state of the art in an otherwise drab paper, I just get annoyed :).

21 comments:

Suresh Venkatasubramanian04 February, 2009 19:42
Although, even in algorithms land, it's about big-O if the problem is interesting. If not, that doesn't help that much. I think ultimately in theoryCS (as in much of math), interestingness is a kind of PageRank: the more links from the result to other results, the better.

Or you could look at the LONG list that Terry Tao once made
ReplyDelete
Replies
Anonymous12 May, 2009 10:27
酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店酒店兼差PRETTY GIRL酒店公關酒店小姐彩色爆米花酒店兼職,酒店工作彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀彩色爆米花
ReplyDelete
Replies
Anonymous15 June, 2009 01:58
hey i totally agree to your point. I also came across some papers recently were there was a small tweaking done to the existing work.. I feel this isn't the breakthrough research... Anyways Nice Blog.. I follow your blog :)
ReplyDelete
Replies
Nightlife Web Directory05 July, 2009 19:59
I really liked your blog! You have some great content. Check out my blog and give me some feedback Please come visit my site Free Adult Entertainment when you got time.
ReplyDelete
Replies
Advertising Agencies05 July, 2009 20:01
Well, nice article buddy… Someone will love to read this infor if I tell her about this. She’s really interested in this subject. Thanks again… Please come visit my site EadvertisingNetwork when you got time.
ReplyDelete
Replies
hotel rooms18 July, 2009 07:19
This is just another reason why I like your website. I like your style of writing you tell your stories without out sending us to 5 other sites to complete the story. Please come visit my site Motelhotel Business Directory when you got time.
ReplyDelete
Replies
personal training22 July, 2009 09:21
Comprehensive web resource of Aerobics Instruction, Classes & Instruction, Exercise & Fitness Clothing Retail, Exercise & Fitness Equipment & Supplies Dealers, Exercise & Fitness Equipment & Supplies Wholesale & Manufacturers, Exercise & Fitness Equipment Rental, Exercise & Fitness Equipment Service & Repair, Fitness Consultants, Gymnasium Equipment & Supplies Dealers, Gymnastics Instruction.
ReplyDelete
Replies
custom homes30 July, 2009 02:28
Really great work. Congrats to everyone who are involved with this project. The website layout and graphics are really cool. Please come visit my site Realtor Home Builders Directory when you got time.
ReplyDelete
Replies
Anonymous03 November, 2009 04:24
Thanks for sharing,
Dissertation | Essay | Research Paper
ReplyDelete
Replies
Anonymous15 December, 2009 00:06
搬运车
 堆高车
 油桶车
 手动液压搬运车
 液压托盘搬运车
 搬运车
 液压搬运车
 手动搬运车
 托盘搬运车
 手动液压托盘搬运车
 不锈钢搬运车
 镀锌搬运车
 电子秤搬运车
 称重式搬运车
 搬运地牛
 手动液压拖板车
 迷你搬运车
 重载搬运车
 超长搬运车
 超宽搬运车
 低放搬运车
 高起升搬运车
 剪式搬运车
 电动搬运车
 半电动搬运车
 平台车
 手动液压平台车
 电动平台车
 液压平台车
 油桶车
 圆桶车
 油桶搬运车
 油桶装卸车
 圆桶装卸车
 油桶夹
ReplyDelete
Replies
Anonymous15 December, 2009 00:10
堆高车
 手动堆高车
 手动叉车
 堆垛车
 手动堆垛车
 手动叉车
 半电动堆高车
 半电动堆垛车
 电动堆高车
 电动堆垛车
 堆高机
 叉车
 电动叉车
 内燃叉车
 液压升降平台
 电动升降平台
 移动式登车桥
 固定式登车桥
ReplyDelete
Replies
deLi26 December, 2009 09:47
sohbet odaları
sohbet
yonja
chat siteleri
forum siteleri
toplist ekle
sohbet
yonja
netlog
sohbet
kizlarla sohbet
sohbet
sohbet
dini sohbet
islami sohbet
chat
sohbet chat
mirc indir
cinsel sohbet
porno izle
camfrog indir
lida
kurumsalseo.com R10 lida fx15 pohudey zayıflama
ReplyDelete
Replies
Anonymous31 January, 2010 23:45
sick. well done r-mean. keep it up
getse hayoutioune

Thesis Writing | Dissertation Writing | Essay Writing | Assignment Writing
ReplyDelete
Replies
Anonymous09 February, 2010 22:31
absense of mind can do things without having reasons.. one can say that

Thesis Help | Dissertation Help | Essay Help | Assignment Help
ReplyDelete
Replies
Paul27 February, 2010 04:23
Recently I came across your blog and have been reading along. I think I would leave my first comment. I do not know what to say, but I like reading. Good blog. I will continue to visit the blog very often.
Squidoo Lens Creation Social Media News Squidoo Lens Creation Link Building Services
ReplyDelete
Replies
dissertations quality01 March, 2010 23:59
Your all thought is very helpful and I good feel read your blog

Thanks
ReplyDelete
Replies
Anonymous18 March, 2010 01:51
Hi,
This is inspiring; I am very pleased by this post. Nice work, thanks for such information.
ReplyDelete
Replies
gamefan1215 April, 2010 09:48
I think this is so good and definitely benefit a lot of people if they use it. I think this is so good.
fort lauderdale auto accident attorney
ReplyDelete
Replies
speed dating in new york26 May, 2010 23:30
As a newbie, this article was really helpful, thanks for sharing!
ReplyDelete
Replies
daniel john01 June, 2010 03:17
You made some good points there. I did a search on the topic and found most people will agree with your blog.

Term papers
ReplyDelete
Replies
combattery8407 July, 2010 00:27
IBM ThinkPad R60 Battery
IBM ThinkPad T60 Battery
IBM ThinkPad T41 Battery
IBM ThinkPad T43 Battery
IBM ThinkPad X40 Battery
Thinkpad x24 battery
ThinkPad G41 battery
IBM thinkpad r52 battery
Thinkpad x22 battery
IBM thinkpad t42 battery
IBM thinkpad r51 battery
Thinkpad r50 battery
IBM thinkpad r32 battery
Thinkpad x41 battery
SONY VGP-BPS2 Battery
SONY VGP-BPS2C Battery
SONY VGP-BPS5 battery
SONY VGP-BPL2C battery
SONY VGP-BPS2A battery
SONY VGP-BPS2B battery
SONY PCGA-BP1N battery
SONY PCGA-BP2E battery
SONY PCGA-BP2NX battery
SONY PCGA-BP2S battery
SONY PCGA-BP2SA battery
SONY PCGA-BP2T battery
SONY PCGA-BP2V battery
SONY PCGA-BP4V battery
SONY PCGA-BP71 battery
SONY PCGA-BP71A battery
SONY VGP-BPL1 battery
SONY VGP-BPL2 battery
ReplyDelete
Replies

Add comment