05 June 2007

Tracking the State of the Art

I just received the following email from Yoav Goldberg:

I believe a resource well needed in the ACL community is a "state-of-the-art-repository", that is a public location in which one can find information about the current state-of-the-art results, papers and software for various NLP tasks (e.g. NER, Parsing, WSD, PP-Attachment, Chunking, Dependency Parsing, Summarization, QA, ...). This will help newcomers to the field to get the feel of "what's available" in terms of both tasks and available tools, and will allow active researchers to keep current on fields other than their own.

For example, I am currently quite up to date with what's going on with parsing, PoS tagging and chunking (and of course the CoNLL shared tasks are great when available, yet in many cases not updated enough), but I recently needed to do some Anaphora Resolution,
and was quite lost as for where to start looking...

I think the ACL Wiki is an ideal platform for this, and if enough people will show some interest, I will create a "StateOfTheArt" page and start populating it. But, before I do that, I would like to (a) know if there really is an interest in something like this and (b) hear any comments you might have about it (how you think it should be organized, what should be the scope, how it can be advertised other than in this blog, etc).
I find this especially amusing because this is something that I'd been planning to blog about for a few weeks and just haven't found the time! I think that this is a great idea. If we could start a community effect where everytime you publish a paper with new results on a common task, you also publish those results on the wiki, it would make life a lot easier for everyone.

I would suggest that the pages essentially consist of a table with the following columns: paper reference (and link), scores in whatever the approate metric(s) are, brief description of extra resources used. If people feel compelled, they would also be encouraged to write a paragraph summary under the table with a bit more detail.

I would certainly agree to use this and to support the effort, I would be happy to go back through all my old papers and post their results on this page. It would be nice if someone (Yoav perhaps???) could initialize pages for the main tasks, so that that burden is lifted.

I'm sure other suggestions would be taken to heart, so comment away!

23 comments:

Peter Turney said...

As one of the people who initiated the ACL Wiki, this use of the wiki is entirely compatible with my vision of the wiki. I've already done something along these lines here:

http://aclweb.org/aclwiki/index.php?title=TOEFL_Synonym_Questions

http://aclweb.org/aclwiki/index.php?title=SAT_Analogy_Questions

Panos Ipeirotis said...

This is a great idea. In general, wikis seem to be a great medium for keeping
track of the "state of the art" in any field. There are the appropriate
incentives for individual authors to post their own results in such a wiki, so
this seems to be a self-sustaining approach. Whoever believes that has the
best tool for some task, they can post their entry, and gain visibility.

I was thinking of doing the same for survey papers that summarize the state of
the art in a particular field. (See the
related blog entry.
)

One of the issues raised for maintaining such "state of the art" lists was the
lack of support from current wikis for adding semantically meaningful links
that can connect the different papers, techniques, tools, and so on. (e.g.,
tool A "complements" tool B, tool C "outperforms" tool D). Still, I believe that
this approach has potential.

Peter Turney said...

The mandate of the ACL Wiki is "to facilitate the sharing of information on all aspects of Computational Linguistics". Survey papers and state-of-the-art-repositories fit the mandate perfectly. As they say at Wikipedia, "Be bold!"

http://en.wikipedia.org/wiki/Wikipedia:Be_bold_in_updating_articles

Fernando Pereira said...

One worry with this proposal is that published results do not define the state-of-the-art; reproduced results are what is needed. All too often, published results are not reproducible or very difficult to reproduce. I have seen instances of papers that were rejected because their results were not better than a "state-of-the-art" that no one could reproduce. At the very least, state-of-the-art status requires published code and data that will yield the state-of-the-art results. That is not the standard in our field yet.

Yoav said...

Good to know there is interest in this proposal!

I agree with Fernando that reproducible results are far more important than claimed results, and think an "Available Software" column in the listing can go a long way in solving this issue.

Another issue that I would like to hear comments about before I bootstrap some pages in the wiki is how to deal with similar-yet-different tasks. Three instantiations of this are (1) tasks that really have a lot in common, or that subsume each other (e.g. NP Bracketing vs. NP Chunking vs. Chunking). (2) Different learning frameworks (e.g. Rule based vs. Supervised vs. Semi Supervised vs. Unsupervised). And (3) languages other than English.

How should these be organized? Should they be considered the same task? A completely different task? Subtasks in some kind of an hierarchy? Any other suggestions?

Koos said...

Funny this should come up. I am trying to do the very same thing Hal is discussing here, and let me tell you, it's a mess out there. What should I read up on while unemployed?

At the University of Berkeley they parse really fast, and they have a great POS tagging demo. Wow. Should I quickly learn their techniques?

I am reading Bikel et. al.'s NER extraction paper. Great paper, but a thorough understanding of HMMs is required. After you get the main technique, does the remainder of the paper (endless smoothing formulae with Lambdas) just contain lab-specific solutions, and is it worth wading through?

What are is the basic knowledge an NLPer out on the job market needs, anyway?

I have tried to sift what *I* think are the highlights of the past 10 years, and really, there are not that many. I said 'highlights' That does not mean I frown on all the intense research as not being potential highlights.

Thing that struck me as that for instance, Jurakfski and Galdea are wporking on lexical semantics, but this was tackled at BBN 14 years ago. Penelope Sibun, in what is almost an afterthought in Cutting's Xerox paper, claims good results relating arguments and assigning semantic roles.

I am lost. My interpretation of all this is: there is not all that much ground-breaking innovation, and if there is, we don't know it yet (or I don't know it yet).

I have tried to put all this together on a fledgling set of webpages. If you think that's a contribution to this conversation, great.

http:///www.geocities.com/koos_wilt/TheLinguisticsPages/intro.html

Ryan McDonald said...

koos wrote:

"Thing that struck me as that for instance, Jurakfski and Galdea are wporking on lexical semantics, but this was tackled at BBN 14 years ago. Penelope Sibun, in what is almost an afterthought in Cutting's Xerox paper, claims good results relating arguments and assigning semantic roles.

I am lost. My interpretation of all this is: there is not all that much ground-breaking innovation, and if there is, we don't know it yet (or I don't know it yet)."

I suggest you re-read the Cutting paper and the latest papers on semantic role labeling (perhaps the work of Pradhan et al.). The Cutting paper reports 80% accuracy on a coarse-grained classification task, whereas modern papers report 90%+ on a more fine-grained classification task.

I am not sure where you get the impression that there has been no ground breaking research. What about machine translation? Systems have gone from language specific and totally unusable to robust, language general and very much useful (though with many more improvements still needed).

Discriminative models, rich feature sets and other developments have led to named-entity taggers with accuracies above 90% This is not only for simple categories like people names and places, but also for complex entity types like genes and chemical compounds. Entity taggers are so robust today, that they can often be (and are) used out-of-the-box for many real world applications.

Similar improvements have been made in parsing, word sense disambiguation, generation, discourse analysis, relation extraction, co-reference resolution, etc.

It might be true that it is rare for a single paper to be considered "ground breaking innovation". However, I think it is simplistic to expect that. Language is complex and difficult. Though we want our solutions to ultimately be as simple as possible, we should expect the path in which we reach those solutions to also be complex and as a result incremental. When taken as a whole, I think it would be hard to argue that the body of research over the past 10 years has not been innovative.

An interesting take on incremental research can be seen in a post by Fernando Pereira

hal said...

yoav -- available software is a big plus. i'm not sure how to handle the similar tasks -- a reasonably dense linking structure might be the way to go. imo, you should make it so that it is as easy as possible for people to add their info, even if this makes it slightly harder to find. if it's hard to enter, no one will and it will be useless. if it's easy to enter but hard(er) to find, then it's still better than combing 100s of papers, so there's still benefit.

koos/ryan: i think ryan is right. a lot of times it's somewhat hard to track progress because the problems are a bit amorphous. the same problem goes by different names; similar yet different problems by the same name. i would say that while there have been few papers over the past decade that alone have been amazingly groundbreaking, the sum progress is huge. i'm oversimplifying here, but 10 years ago things didn't work at all. today many things work well enough.

Koos said...

Ryan wrote in response to my posting: I suggest you re-read the Cutting paper and the latest papers on semantic role labeling (perhaps the work of Pradhan et al.). The Cutting paper reports 80% accuracy on a coarse-grained classification task, whereas modern papers report 90%+ on a more fine-grained classification task.

My reply: Thank you for your reaction (and man, do my typos look embarrassing). Please realize my post should be taken in the spirit of this discussion, which I interpret to be "how can we see the forest for the trees?"

Ryan wrote in response to my posting:I am not sure where you get the impression that there has been no ground breaking research.

My reply: There are a number of reasons why I have that impression, the main one being *I* am having a hard time seeing the forest for the trees. (This implies others may not have a similarly hard time).

As an 'industrial linguist', but not one working at a major research lab, it is hard for me to determine which particular line of research is important and will bear fruit in the (near) future. I might have formulated my anguish ( :) ) as a question very much in keeping with this particular topic: how will any serious researcher determine which papers/lines of research are the Church's, the Cutting et. al.'s and the Weischedels of the present? In other words, I am not saying there is no progress, per se (I did say that verbatim, but phrased it awkwardly); I am saying - what is the most effective way for an 'industrial linguist' to stay informed of significant research.

Ryan wrote in response to my posting: What about machine translation? Systems have gone from language specific and totally unusable to robust, language general and very much useful (though with many more improvements still needed).

My reply: I am all-too-happy to hear it, having done some actual work in MT. And yes, it used to be an intractable problem. My current interest, however, lies in working with other textual technologies.

Ryan wrote in response to my posting:Discriminative models, rich feature sets and other developments have led to named-entity taggers with accuracies above 90% This is not only for simple categories like people names and places, but also for complex entity types like genes and chemical compounds. Entity taggers are so robust today, that they can often be (and are) used out-of-the-box for many real world applications.

My reply: I am aware of this, but, in a way, my awareness is too dim. And that's in keeping with the purpose of this partcular conversation: how do we see the forest for the trees?

Ryan wrote in response to my posting:It might be true that it is rare for a single paper to be considered "ground breaking innovation". However, I think it is simplistic to expect that. Language is complex and difficult. Though we want our solutions to ultimately be as simple as possible, we should expect the path in which we reach those solutions to also be complex and as a result incremental. When taken as a whole, I think it would be hard to argue that the body of research over the past 10 years has not been innovative.

My reply: You are absolutely correct in the previous paragraph. Again, though, my question is "How do we, in the field, having CTOs and CEOs that expect results, effectively wade through the deluge of papers and information to keep up?" There are several routes to take on which one can read/study incrementally.

Again, take my web visit to Berkeley as an example. The demo there is downright impressive. The tagger is incredibly fast, and the parser even faster. It's also accurate, and it deals with unseen data. Does this imply I should start reading their every research paper? Of course not, but then what *should* I read? Again, that seems what this conversation is supposed to address, correct?

Koos said...

Ryan wrote in response to my posting: I suggest you re-read the Cutting paper and the latest papers on semantic role labeling (perhaps the work of Pradhan et al.). The Cutting paper reports 80% accuracy on a coarse-grained classification task, whereas modern papers report 90%+ on a more fine-grained classification task.

My reply: Thank you for your reaction (and man, do my typos look embarrassing). Please realize my post should be taken in the spirit of this discussion, which I interpret to be "how can we see the forest for the trees?"

Ryan wrote in response to my posting:I am not sure where you get the impression that there has been no ground breaking research.

My reply: There are a number of reasons why I have that impression, the main one being *I* am having a hard time seeing the forest for the trees. (This implies others may not have a similarly hard time).

As an 'industrial linguist', but not one working at a major research lab, it is hard for me to determine which particular line of research is important and will bear fruit in the (near) future. I might have formulated my anguish ( :) ) as a question very much in keeping with this particular topic: how will any serious researcher determine which papers/lines of research are the Church's, the Cutting et. al.'s and the Weischedels of the present? In other words, I am not saying there is no progress, per se (I did say that verbatim, but phrased it awkwardly); I am saying - what is the most effective way for an 'industrial linguist' to stay informed of significant research.

Ryan wrote in response to my posting: What about machine translation? Systems have gone from language specific and totally unusable to robust, language general and very much useful (though with many more improvements still needed).

My reply: I am all-too-happy to hear it, having done some actual work in MT. And yes, it used to be an intractable problem. My current interest, however, lies in working with other textual technologies.

Ryan wrote in response to my posting:Discriminative models, rich feature sets and other developments have led to named-entity taggers with accuracies above 90% This is not only for simple categories like people names and places, but also for complex entity types like genes and chemical compounds. Entity taggers are so robust today, that they can often be (and are) used out-of-the-box for many real world applications.

My reply: I am aware of this, but, in a way, my awareness is too dim. And that's in keeping with the purpose of this partcular conversation: how do we see the forest for the trees?

Ryan wrote in response to my posting:It might be true that it is rare for a single paper to be considered "ground breaking innovation". However, I think it is simplistic to expect that. Language is complex and difficult. Though we want our solutions to ultimately be as simple as possible, we should expect the path in which we reach those solutions to also be complex and as a result incremental. When taken as a whole, I think it would be hard to argue that the body of research over the past 10 years has not been innovative.

My reply: You are absolutely correct in the previous paragraph. Again, though, my question is "How do we, in the field, having CTOs and CEOs that expect results, effectively wade through the deluge of papers and information to keep up?" There are several routes to take on which one can read/study incrementally.

Again, take my web visit to Berkeley as an example. The demo there is downright impressive. The tagger is incredibly fast, and the parser even faster. It's also accurate, and it deals with unseen data. Does this imply I should start reading their every research paper? Of course not, but then what *should* I read? Again, that seems what this conversation is supposed to address, correct?

Yoav said...

Ok, I created a new Wiki category called "State of the Art" with a link from the first ACLWiki page. I populated it with skeletons for some core NLP tasks, and started filling in some of the entries (for now some POS tagging and some Parsing, more will follow soon).

Contributions and updates are ofcourse welcome!

Koos said...

Thank you, Yoav - being a novice to this blog, could you tell me where this 'state of the art' link is?

-Koos

Yoav said...

koos -- it's not a part of the blog, but of the ACL wiki. Here's the url of the Wiki's main page: http://aclweb.org/aclwiki/index.php?title=Main_Page

Koos said...

Looks very useful, Yoav (and others). I have about 2-3 hours a week I could spend as a volunteer.
Is there anything I could do working on this?

Bob Carpenter said...

I checked out the new Wiki for results. In the POS tagging entry, I noticed Libin Shen et al.'s new tagging paper from ACL '07.

It reports an improvement from Toutanova et al.'s 97.24 to 97.33 on the same old sections of the treebank (test on sections 22-24). I can't afford the treebank, so I'm just estimating here, but there are about 1M words, and about 25 sections, so the test set is only about 120K words.

A simple binomial hypothesis test would put a one-sigma confidence interval at sqrt(.97 * (1 - .97) / 120,000), or 0.0005. The 95% confidence interval would be 2 sigma, or about.001, or about .1%, or just about the improvement noted in the paper.

So is the result "significant"? No, it's not, because the confidence interval is still too fat. For it to be a true confidence interval, the tests would have to be taken at random. But they're not -- they're all taken from section 22-24 of the Treebank, in which there are all kinds of temporal and topical dependendencies within the whole articles making up the corpus. For instance, the same phrase shows up again and again referring to a person, but the evals treat them as independent.

Another assumption is that we don't build gazillions of systems and then choose the best one post-hoc. The multi-way significance eval would be much stricter.

I don't mean to pick on Shen et al. I had the same reaction to Michael Collins's paper on improving his parser some fractional degree. And often reimplementations of the same "idea" have this much noise in them (e.g. Bikel's reimplementation of Collins's parser).

This is a problem in our field and how it misunderstands significant improvements. I've had papers rejected for not evaluating on a "standard" test set, even when there wasn't one.

Finally, I'd like to plea for memory and time reporting for results. Ideally with the amount of human effort spent feature tweaking. When I'm shopping for a technique for a commercial app, these are overriding concerns that dwarf 0.001 improvements in accuracy on an "easy" test set that matches the training data. In that vein, I'd love to see results on words not in the training set.

Jan said...

Wouldn't it also be nice to have the information about the language for which the results were obtained? I'm new into this field, but I assume most results are language dependent, and I can also imagine that there are languages for which the performance will lag behind forever in comparison to, for example, English. Moreover, I agree that reproducibility is crucial thus I think it would be nice to have an indication whether and where the results have been reproduced.

Anonymous said...

網頁設計,情趣用品,情趣用品,情趣用品,情趣用品
色情遊戲,寄情築園小遊戲,情色文學,一葉情貼圖片區,情惑用品性易購,情人視訊網,辣妹視訊,情色交友,成人論壇,情色論壇,愛情公寓,情色,舊情人,情色貼圖,色情聊天室,色情小說,做愛,做愛影片,性愛

免費視訊聊天室,aio交友愛情館,愛情公寓,一葉情貼圖片區,情色貼圖,情色文學,色情聊天室,情色小說,情色電影,情色論壇,成人論壇,辣妹視訊,視訊聊天室,情色視訊,免費視訊,免費視訊聊天,視訊交友網,視訊聊天室,視訊美女,視訊交友,視訊交友90739,UT聊天室,聊天室,豆豆聊天室,尋夢園聊天室,聊天室尋夢園,080聊天室,080苗栗人聊天室,女同志聊天室,上班族聊天室,小高聊天室 

AV,AV女優
視訊,影音視訊聊天室,視訊交友
視訊,影音視訊聊天室,視訊聊天室,視訊交友,視訊聊天,視訊美女

. said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

酒店上班請找艾葳 said...

艾葳酒店經紀提供專業的酒店經紀,酒店上班,酒店打工、兼職、酒店相關知識等酒店相關產業服務,想加入這行業的水水們請找專業又有保障的艾葳酒店經紀公司!
艾葳酒店經紀是合法的公司、我們是不會跟水水簽任何的合約 ( 請放心 ),我們是不會強押水水辛苦工作的薪水,我們絕對不會對任何人公開水水的資料、工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆。
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班環境、上班條件給水水們。

seldamuratim said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chatkamerali sohbetseslisohbetsesli sohbet sitelerisesli chat siteleriseslichatsesli sohpetseslisohbet.comsesli chatsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet
seslisohbetsesli sohbetkamerali sohbetsesli chatsesli sohbetkamerali sohbet

cilemsin42 said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..
sesli sohbetsesli chat
sesli sohbet siteleri

sesli chat siteleri sesli sohbetsesli chat
sesli sohbet siteleri
sesli chat siteleri
SesliChat
cılgın sohbet
güzel kızlar
bekar kızlar
dul bayanlar
seviyeli insanlar
yarışma
canlı müzik
izdivac
en güzel evlilik
hersey burada
sesliparti
seslisohbet odalari
Sesli adresi
Sesli Chat
SesliChat Siteleri
Sesli Chat sitesi
SesliChat sitesi
SesliSohbet
Sesli Sohbet
Sesli Sohbet Sitesi
SesliSohbet Sitesi
SesliSohbet Siteleri
Muhabbet Sitesi
kamerali chat
Görüntülü Sohbet
Hasret gülleri
Çet sitesi
SesliSohbet
Sesli Sohbet
Canli sohbet
Turkce sohbet
Kurtce Sohbet
Kurtce Chat
Kurtce Muhabbet
Kurtce Sohbet
Kurdish Chat
SesliChat
Sesli Chat
SesliSanal
Guncel Haber
sohbet Sitesi
Chat sitesi..

DiSCo said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
kameralı sohbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
seslidunya
seslisehir
sesli sex

Sesli Chat said...

Really trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it

to a few friends of mine that I know would enjoy reading..
seslisohbet
seslichat
sesli sohbet
sesli chat
sesli
sesli site
görünlütü sohbet
görüntülü chat
kameralı sohbet
kameralı chat
sesli sohbet siteleri
sesli chat siteleri
sesli muhabbet siteleri
görüntülü sohbet siteleri
görüntülü chat siteleri
görüntülü muhabbet siteleri
kameralı sohbet siteleri
kameralı chat siteleri
kameralı muhabbet siteleri
canlı sohbet
sesli muhabbet
görüntülü muhabbet
kameralı muhabbet
birsesver
birses
seslidunya
seslisehir
sesli sex