tag:blogger.com,1999:blog-19803222.post8465870987707832413..comments2024-03-18T01:45:45.724-06:00Comments on natural language processing blog: Manifold Assumption versus Margin Assumptionhalhttp://www.blogger.com/profile/02162908373916390369noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-19803222.post-42876438157851155552010-09-13T11:02:44.878-06:002010-09-13T11:02:44.878-06:00I think the ideas in the paper "Inference in ...I think the ideas in the paper "Inference in the Universum" will give you a way to seamlessly combine the two. The key idea is you may have a fairly densely sampled low-d manifold, but only some of the points on it can actually be classified. The rest of the points are just "background" without a ground-truth label. That paper talks about how to do a large-margin classification where the margin is in respect to the manifold space.Unknownhttps://www.blogger.com/profile/00359192680544871601noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-57522665323440698492010-09-10T02:35:11.730-06:002010-09-10T02:35:11.730-06:00Although I see your point, I wonder whether the ma...Although I see your point, I wonder whether the main problem isn't simply that the manifold assumption is not very likely to hold in many real-world data sets? In particular, the amount of data we need to have a densely sampled manifold grows exponentially with the intrinsic dimensionality of that data. I once saw a study that estimated the intrinsic dimensionality of the space of facial appearance to be about 100, so it seems unlikely we will ever have a data set that densely samples that manifold. Moreover, don't quite a few data sets constitute (fairly) widely separated clusters?<br /><br />@Anonymous: I am fairly convinced the space of handwritten digits does not form a single manifold, but a bunch of widely separated ones. See, e.g., the t-SNE visualization of the MNIST data at http://homepage.tudelft.nl/19j49/t-SNE_files/mnist_large.jpgUnknownhttps://www.blogger.com/profile/16929362845941752167noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-20076217960290290592010-09-09T16:57:29.784-06:002010-09-09T16:57:29.784-06:00To realize the benefits of the manifold assumption...To realize the benefits of the manifold assumption, isn't it enough to assume that each class belongs to a separate manifold? And in that case, there is not necessarily a contradiction with the margin assumption, right?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-13613191062801327042010-09-09T09:13:35.833-06:002010-09-09T09:13:35.833-06:00Oops -- I posted this, then got sick, so couldn...Oops -- I posted this, then got sick, so couldn't respond...<br /><br />@Anonymous: I guess I don't really follow. You basically seem to be arguing "there is a manifold, but then we sample but don't sample near the margin" but then I don't know what it means for "there [to be] a manifold"? Our data is our only access to whatever there "is" and if it's not sampled near the margin, then I have to say that there's nothing near the margin.<br /><br />Following the same logic I could say something like "no there's no manifold, it fills up the entirety of R^d, it's just that when we sample we don't sample everywhere." It seems tautological to me.<br /><br />(Plus, "the set of all possible handwritten symbols form a manifold" is a very strong statement, and I think you'd have a hard time backing that up.)<br /><br />That said, I think the main issue with manifolds that I wanted to talk about was the <b>dense sampling</b> on a <b>single manifold</b> issue. This is required to make most of the theory I know go through (eg., the ISOMAP theory), and is also required in practice otherwise you'll get a meaningless nearest neighbor graph. But if you're densely sampled, and on a single manifold, then there absolutely should be no margin.<br /><br />@Anonymous2: There really isn't anything :). But, people who do manifold learning almost never use anything complicated about topology. In fact, the fact that the thing is even a manifold (eg., and has local tangent spaces) is almost never used!<br /><br />@Kevin: Yeah, you're right. Densely sampled means roughly that anywhere there is manifold, you'll find "enough" data points, where "enough" has to do with the noise rate.<br /><br />@Kevin @Alexandre: Yes, of course these things work in practice :). I guess what I'm saying is that saying that you're using a "densely sampled manifold" comes with <i>a lot</i> of hidden assumptions, which I don't think are ever realized. So the question is: can we pare down these assumptions to something that we might actually believe about our data?halhttps://www.blogger.com/profile/02162908373916390369noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-60656704946417250152010-09-09T08:15:08.717-06:002010-09-09T08:15:08.717-06:00I'm not sure I understand. If you can't ma...I'm not sure I understand. If you can't maximize some margin-like quantity given the manifold, what is it good for? I thought manifold learning methods modeled each class as its own manifold, and penalize class boundaries in dense regions. The "Analysis of Representations for Domain Adaptation" paper, by Ben-david, Blitzer, Crammer and Pereira even suggests that an optimal strategy for DA looks like a mixture between fitting the source and target data in the same manifold and maximizing the margin between the classes.Alexandre Passoshttps://www.blogger.com/profile/10099321916600547808noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-25162447474835297292010-09-08T17:20:11.546-06:002010-09-08T17:20:11.546-06:00I think there are similarities between the two ass...I think there are similarities between the two assumptions at a deeper level: both say that labels are locally smooth. The difference is space at which this operates (Euclidean vs Reimannian). Perhaps your paradox comes from the idea that the manifold should be "densely sampled"--I'm not sure about this point, what exactly does this mean? <br /><br />In practice, it's well known what assumption you ought to use depends a lot on dataset. Chapelle's semi-supervised learning book has a nice experimental comparison suggesting that image data seems more manifold-like, and text data seems more cluster-like.<br /><br />I guess it won't be too hard to test your hypothesis about whether the two are incompatible: (1) run some manifold learning algorithm to embed the data (e.g. image) in low dimensional Euclidean space. (2) run a margin classifier like Transductive SVM. If it performs horribly compared to a directly applying TSVM or Graph-SSL algorithm, then that might suggest they are incompatible.Kevin Duhhttps://www.blogger.com/profile/07407894290644783502noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-46587579619614007732010-09-08T09:46:37.482-06:002010-09-08T09:46:37.482-06:00I agree with anonymous: using manifold regularizat...I agree with anonymous: using manifold regularization in max-margin classifiers makes sense and shows improvements over either of the methods alone in multiple datasets I've used.Misc Research Stuffhttp://resnotebook.blogspot.comnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-6133373276551409532010-09-08T03:57:02.599-06:002010-09-08T03:57:02.599-06:00Hi, I have a kind of "meta question": Wh...Hi, I have a kind of "meta question": What are good references (e.g. books, articles, etc) for a beginning ML/DM student to gain (mostly) math background on the concepts discussed in this post? For example manifold comes for topology, right? Any good and preferably short introduction to that? Also what about "dense sampling"? <br /><br />thank you very much!<br />SteliosAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-44874665651844106382010-09-07T21:24:07.408-06:002010-09-07T21:24:07.408-06:00I dont agree that the two assumptions are incompat...I dont agree that the two assumptions are incompatible.<br /><br />The margin assumption rises from a particular sampling scheme on the manifold.<br /><br />For example, take the USPS digits dataset that is so often used; the set of all possible handwritten symbols form the manifold, and the sampling (9 digits) form the margin. <br /><br />There are tradeoffs between regularizing for margins and for smoothness, but regardless it is misleading to claim these two are incompatible.Anonymousnoreply@blogger.com