[This post is based on some discussions that came up while talking about manifold learning with Ross Whitaker and Sam Gerber, who had a great manifold learning paper at ICCV last year.]
There are two assumptions that are often used in statistical learning (both theory and practice, though probably more of the latter), especially in the semi-supervised setting. Unfortunately, they're incompatible.
The margin assumption states that your data are well separated. Usually it's in reference to linear, possibly kernelized, classifiers, but that need not be the case. As most of us know, there are lots of other assumptions that boil down to the same thing, such as the low-weight-norm assumption, or the Gaussian prior assumption. At the end of the day, it means your data looks like what you have on the left, below, not what you have on the right.
The problem, of course, is that once you say "margin" and "manifold" in the same sentence, things just can't possibly work out. You'd end up with a picture like:
In fact, almost by definition, once you stick a margin into a manifold (which is okay, since you'll define margin Euclideanly, and manifolds know how to deal with Euclidean geometry locally), you're hosed.
So I guess the question is: who do you believe?