tag:blogger.com,1999:blog-19803222.post5459803772463123951..comments2024-03-18T01:45:45.724-06:00Comments on natural language processing blog: Making sense of Wikipedia categorieshalhttp://www.blogger.com/profile/02162908373916390369noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-19803222.post-40560858885190847842012-03-11T14:52:46.166-06:002012-03-11T14:52:46.166-06:00What about counting the number of paths to a topic...What about counting the number of paths to a topic from a category? Would be linear in number of nodes*number of categories.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-76947811995566113422012-02-27T11:21:53.768-07:002012-02-27T11:21:53.768-07:00I used the Wikipedia categories to define a vector...I used the Wikipedia categories to define a vector space (mostly for disambiguation purposes), which gave OK results. The code I used for doing that is <a href="https://github.com/bbcrd/rdfsim" rel="nofollow">available on our Github</a>, in case that's useful. Using them literally though, given the weird things you point out, will probably give very odd results :)Yveshttp://moustaki.orgnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-34056686027774346642012-02-26T12:04:32.233-07:002012-02-26T12:04:32.233-07:00I did a bachelor's thesis on using Wikipedia c...I did a bachelor's thesis on using Wikipedia categories for NE recognition, based on this paper: http://www.mt-archive.info/ACL-2008-Richman.pdf . But that uses categories from the bottom-up, so to speak, where as you are talking about top-down. <br /><br />I wasn't able to reproduce the same level of results as in that paper but my software was surely much more crude and my level of knowledge much less than the authors'.Jessicahttps://www.blogger.com/profile/10936027979999135776noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-17259852775659832412012-02-25T22:13:20.139-07:002012-02-25T22:13:20.139-07:00Depending on your use-case, you might find YAGO (h...Depending on your use-case, you might find YAGO (http://www.mpi-inf.mpg.de/yago-naga/yago/) useful. <br />We had started with Wikipedia, but switched to YAGO for our paper (http://www2011india.com/proceeding/companion/p21.pdf) on answering transitive type-entity queries.<br />The YAGO category hierarchy is much more cleaner and manageable.Athenahttps://www.blogger.com/profile/01287602914223430594noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-43508792636378453782012-02-22T04:31:32.368-07:002012-02-22T04:31:32.368-07:00My method is stupid but effective: manually prunin...My method is stupid but effective: manually pruning. First I generate the hierarchical taxonomy, and then skim it by dragging the scrollbar. If I find something of no concern, I'll locate where the hierarchy goes astray and then add the topmost undesired category to a "blacklist". Next time when the taxonomy is generated, the children of this category won't be added to the hierarchy. Iterate this process and the taxonomy becomes purer and purer.Unknownhttps://www.blogger.com/profile/15797413064239466971noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-52877029625418336052012-02-20T18:16:30.380-07:002012-02-20T18:16:30.380-07:00I have refined the Wikipedia category graph for us...I have refined the Wikipedia category graph for use in the INEX 2010 XML Mining track. This was done by finding the shortest paths between a page and any of the 'Main Topic Classifications'. It results in a multi-label category structure. I only used the last 2 vertices of the shortest path sequences and threw away small categories.<br /><br />You can find the details in the paper at http://eprints.qut.edu.au/41223/.chrishttps://www.blogger.com/profile/14662093233141360874noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-5812424392959180122012-02-20T11:50:59.642-07:002012-02-20T11:50:59.642-07:00I am not very sure what your use-case is. But I...I am not very sure what your use-case is. But I've used the Wikipedia category graph to measure relatedness between two articles. I guess the distance between the category "Biology" and "Chicago Stags coaches" is a good estimate of how (un)related they are. <br /><br />But before creating the graph that, I excluded the root node (Category:Contents) from the graph so as to get rid of "too much of generalization".Chandra Sekharhttps://www.blogger.com/profile/03572564234618673213noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-20767537903314546682012-02-19T12:12:03.734-07:002012-02-19T12:12:03.734-07:00Look at our work on it: http://airlab.elet.polimi....Look at our work on it: http://airlab.elet.polimi.it/images/3/3e/Macro-categories.pdf<br /><br />Another resource to look for is the DBPedia ontology.<br /><br />Cheers,<br /> Riccardo TassoRiccardo Tassohttp://www.twitter.com/riccardotassonoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-18790266220596494702012-02-19T10:33:10.096-07:002012-02-19T10:33:10.096-07:00may be you need to calculate article per topics di...may be you need to calculate article per topics distribution:<br />P_1_2 = P({a from C_1} | {a from C_2}) and P_2_1 = P({a from C_2} | {a from C_1})<br />And based on these distribution calculate topic hierarchy:<br />P_1_2 = P_2_1 => C_1 = C_2<br />P_1_2 > P_2_1 => C_1 -> C_2<br /><br />It is not hard to calculate using some kind of inverted index: C_1 -> {a_1, a_2, ... a_n}Anonymoushttps://www.blogger.com/profile/04201477051937129821noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-30795956355265669832012-02-19T05:41:55.164-07:002012-02-19T05:41:55.164-07:00Your analysis is right: the category structure ist...Your analysis is right: the category structure ist just not very reliable. So a way to deal with it is regarding categories more as tags than as really structured information. I used this idea for NE Classification:<br /><br />http://www.aclweb.org/anthology/W11-3607 (PDF)<br /><br />I think the global structure is way too messy for meaningful analysis. Maybe for many problems it is sufficient to look at local structure in the category network.JoKnopphttps://www.blogger.com/profile/03752487754524990866noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-25042246306103641002012-02-19T04:26:02.594-07:002012-02-19T04:26:02.594-07:00I quite like the paper:
Wu, Fei and Weld, Daniel...I quite like the paper: <br /><br />Wu, Fei and Weld, Daniel S. (2008). Automatically Refining the Wikipedia Infobox Ontology. In Proceedings of the 17th International World Wide Web Conference, (WWW-08), Beijing, China, April, 2008. <br /><br />It was some on this - but actually I think they introduces MORE categories (as well as assigning more articles to categories)gromgullhttps://www.blogger.com/profile/02832687911868366187noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-74103990005728818892012-02-19T00:52:34.872-07:002012-02-19T00:52:34.872-07:00out of curiosity, have you looked at other dbs tha...out of curiosity, have you looked at other dbs that map (at least partially) to wikipedia and have their own taxonomies? http://freebase.com/ , for example, which i've been working with a bit recently.Ryanhttp://snarfed.org/noreply@blogger.com