If you compare vision research with NLP research, there are a lot of interesting parallels. Like we both like linear models. And conditional random fields. And our problems are a lot harder than binary classification. And there are standard data sets that we've been evaluating on for decades and continue to evaluate on (I'm channeling Bob here :P).
But there's one thing that happens, the difference of which is so striking, that I'd like to call it to center stage. It has to do with "messing with our inputs."
I'll spend a bit more time describing the vision approach, since it's probably less familiar to the average reader. Suppose I'm trying to handwriting recognition to identify digits from zero to nine (aka MNIST). I get, say, 100 labeled zeros, 100 labeled ones, 100 labeled twos and so on. So a total of 1000 data points. I can train any off the shelf classifier based on pixel level features and get some reasonable performance (maybe 80s-90s, depending).
Now, I want to insert knowledge. The knowledge that I want to insert is some notion of invariance. I.e., if I take an image of a zero and translate it left a little bit, it's still a zero. Or up a little bit. Of if I scale it up 10%, it's still a zero. Or down 10%. Or if I rotate it five degrees. Or negative five. All zeros. Same hold for all the other digits.
One way to insert this knowledge is to muck with the learning algorithm. That's too complicated for me: I want something simpler. So what I'll do is take my 100 zeros and 100 ones and so on and just manipulate them a bit. That is, I'll sample a random zero, and apply some small random transformations to it, and call it another labeled example, also a zero. Now I have 100,000 training points. I train my off the shelf classifier based on pixel level features and get 99% accuracy or more. The same trick works for other vision problem (eg., recognizing animals). (This process is so common that it's actually described in Chris Bishop's new-ish PRML book!)
This is what I mean by small changes (to the input) begetting good example. A slightly transformed zero is still a zero.
Of course, you have to be careful. If you rotate a six by 180 degrees, you get a nine. If you rotate a cat by 180 degrees, you get an unhappy cat. More seriously, if you're brave, you might start looking at a class of transformations called diffeomorphisms, which are fairly popular around here. These are nice because of their nice mathematical properties, but un-nice because they can be slightly too flexible for certain problems.
Now, let's go over to NLP land. Do we ever futz with our inputs?
Sure!
In language modeling, we'll sometimes permute words or replace one word with another to get a negative example. Noah Smith futzed with his inputs in contrastive estimation to produce negative examples by swapping adjacent words, or deleting words.
In fact, try as I might, I cannot think of a single case in NLP where we make small changes to an input to get another good input: we always do it to get a bad input!
In a sense, this means that one thing that vision people have that we don't have is a notion of semantics preserving transformations. Sure, linguists (especially those from that C-guy) study transformations. And there's a vague sense that work in paraphrasing leads to transformations that maintain semantic equivalence. But the thing is that we really don't know any transformations that preserve semantics. Moreover, some transformations that seem benign (eg., passivization) actually are not: one of my favorite papers at NAACL this year by Greene and Resnik showed that syntactic structure affects sentiment (well, them, drawing on a lot of psycholinguistics work)!
I don't have a significant point to this story other than it's kind of weird. I mentioned this to some people at ICML and got a reaction that replacing words with synonyms should be fine. I remember doing this in high school, when word processors first started coming with thesauri packed in. The result seemed to be that if I actually knew the word I was plugging in, life was fine... but if not, it was usually a bad replacement. So this seems like something of a mixed bag: depending on how liberal you are with defining "synonym" you might be okay do this, but you might also not be.
Pinch-hitter syndrome strikes again
14 hours ago

51 comments:
I replace words with synonyms to generate more positive examples in Similarity of semantic relations. I use two steps of filtering to remove bad positive examples that this replacement can introduce.
In MT, there are a couple of marginal examples of futzing with the input to produce positive training examples. When you're translating, for example, from Chinese to English, where the source segmentation is not given, it is not unheard of to use two different segmentations (say, word-based and character-based). There was also a workshop paper (WMT08 maybe?) from CMU maybe? on using alignments computed over alternative segmentations to improve word alignment of a particular segmentation.
david vickrey's SRL work is all about transforming (specifically, simplifying) sentences in ways that preserve (the relevant) semantics and training an SRL system over the simplified sentences:
D. Vickrey and D. Koller. Sentence Simplification for Semantic Role Labeling. ACL 2008.
http://www.stanford.edu/~dvickrey/underlying.pdf
We paraphrased MT training data using a deep (HPSG-based) grammar to produce more positive training examples and it worked fairly well. Other people have done it before us (cited in the paper).
The problem is that paraphrasing itself is as hard a problem as MT, so there is no guarantee that this should help. We are taking advantage of the fact that there are often better monolingual resources available than bilingual.
Bond, Francis, Eric Nichols, Darren Scott Appling, Michael Paul (2008) "Improving Statistical Machine Translation by Paraphrasing the Training Data", In IWSLT-2008.
http://web.mysites.ntu.edu.sg/fcbond/open/pubs/2008-iwslt-smt-para.pdf
Images and sound are pretty low-level; language at the text level (characters, words) is comparatively high-level. As such, it's very easy to come up with random perturbations to image or sound that preserve their meaning. But random futzing with text isn't very safe. Add or delete random characters? Words? Sentences? It's research problem just to know what futzings are meaning-preserving...
The difference between NLP and for example vision is that NLP is treating discrete tokens, whereas in vision images commonly consist of `continuous' pixel values, neighborhoods are clearly defined, and thus you can wiggle them around.
The same treatment (i.e. creating new samples by random perturbations) is also done in speech recognition, where the input also consists of continuous data.
In terms of input, a fundamental difference between the two is that word representations already encode significant information, whereas pixels do not. In particular, humans can "think" or reason in terms of words; we do not do the same with pixels. Instead we reason in terms of higher order concepts such as shapes, which can be represented by many different, but similar, pixel representations.
One somewhat related idea in IR is pseudo-relevance feedback. It's a very hands-on approach. Suppose a user issues a very short query to a search engine. The user has some information need that's related to what's available in the indexed corpus. One way to get at this information is to first run the original query through our search engine, collect the top K results, and finally take the important words of the top K results as a new query to issue to the search engine. This is basically a smoothing heuristic. The idea is that the information need of the user can be "equivalently" represented using different bags of words (at least given the current level of sophistication of modern search engines).
I'm not sure if these are small changes (and the implementation is certainly harder than adding translation and scaling in images) but Chris Callison Burch has done work on paraphrasing for MT.
Also Graeme Hirst has done interesting work on lexical choice and near-synonymy.
Maybe someone could define a distance measure for synonyms (something like a KL-divergence of neighbor distributions?) and then perturb the examples using "near" synonyms. Or has anybody done this already?
Wow, loving the two photos you posted. You got potential. Please come visit my site WeddingDressCake when you got time.
this seems like it would be a very interesting blog to keep up with. Please come visit my site Wild Life And Park when you got time.
As other people have already mentioned, paraphrasing has been used to futz with the inputs for the various components in MT pipeline: paraphrasing to create new training data and paraphrasing to increase the coverage for a test set. My work is part of the same group and uses sentence-level paraphrasing to create additional inputs for the MERT level of the MT pipeline, i.e., toimprove the tuning the decoder feature weights. And the neat thing is that the sentence-level English paraphraser is built by using nothing more than what you already have for the MT system (by extending CCB's idea)
Are Multiple Reference Translations Necessary? Investigating the Value of Paraphrased Reference Translations in Parameter Optimization. AMTA'08
http://www.umiacs.umd.edu/~nmadnani/pdf/paraphrase-amta08.pdf
Using Paraphrases for Parameter Tuning in Statistical Machine Translation. WMT'07.
http://www.umiacs.umd.edu/~nmadnani/pdf/paraphrase-wmt07.pdf
While not "futzing with the data" per-se, I do have some positive experience with using data to automatically generate positive training samples for an NLP task.
Specifically, for the task of identifying transliterated foreign words (in Hebrew), I got pretty decent results by starting with the CMU-pronunciation dictionary and an English corpus, and generating many possible transliterations for each of the English words based on its pronunciation and some simple and ambiguous phoneme transliteration rules. The vast majority of the generated transliterations would not have been accepted as valid by a human reader, but were quite adequate for training a statistical model to recognize other transliterated words.
(http://www.cs.bgu.ac.il/~yoavg/publications/ForeignWordsArtCameraReady.pdf)
Interesting observation. I wonder if part of the difference, though, arises from the granularity at which you're working. If you have a classification problem in which the universe of images consists of only ten distinct item-types (digits 0 through 9), then there's a lot of room for perturbation and invariance. In the NLP world, this would be like classifying sentences into simple declarations, compound statements, interrogatives, and exclamations; at that level, even in language there is a lot of room for (valid) permutation that wouldn't affect the class of the object. (Sentiment classification might be another such example.) But for parsing or machine translation or automatic summary generation or question answering, the distinctions being made seem to be at a much finer level -- where there would be less room for any valid (or meaningful?) perturbations. These problems are perhaps more akin to digit classification in which the font, weight, and point size all matter. The more details you care about, the fewer invariant transformations exist.
I found your blog on google and read a few Thanks for the information you mentioned here, I'm looking forward to see your future posts. Cheers !! Please come visit my site Lobby Hobby Directory when you got time.
Good tips that are worth checking and these tips are also worth suggesting to friends. Thanks for sharing. Great stuff! . I am new to seo, trying to visit more seo blogs for guides and tips. You can be friends with me. Please come visit my site Peoria Arizona business directory when you got time. Thanks.
Awesome article, definitely liked the info provided. Just subscribed to your blog. Great stuff! . I am new to seo, trying to visit more seo blogs for guides and tips. You can be friends with me. Please come visit my site Pembroke Pines Florida yellow pages when you got time. Thanks.
Awesome! I have read a lot on this topic, but you definitely give it a good vibe. This is a great post. Will be back to read more! Please come visit my site Contractor Painter Business Directory when you got time.
Wholesale shoes direct from the factory. Wholesale shoes,, Nike shoes, nike af1, nike shox, nike r4, nike rt1, ATO, BAPE, Bathing Ape, Puma Selling Dunk
I can see that you are an expert at your field! I am launching a website soon, and your information will be very useful for me. Thanks for all your help and wishing you all the success in your business. Please come visit my site Sacramento Business Directory when you got time.
You got a really useful blog I have been here reading for about an hour. I am a newbee and your success is very much an inspiration for me. Please come visit my site Santa Ana Business Directory when you got time.
Me and my friend were arguing about an issue similar to this! Now I know that I was right. lol! Thanks for the information you post. Please come visit my site Discount Stores and give me any valuable feedbacks
Couldn't be written any better. Reading this post reminds me of my old room mate! He always kept talking about this. I will forward this article to him. Pretty sure he will have a good read. Thanks for sharing! Please come visit my site Equipment Rental when you got time.
Hey congrats on the new posting come out
btw i love your blog although i have just stumbled upon it =)
Love the new pictures you got there! Please come visit my site Boston Yellow Page Business Directory when you got time.
I can see that you are an expert at your field! I am launching a website soon, and your information will be very useful for me. Thanks for all your help and wishing you all the success in your business. Please come visit my site good business management
give me any valuable feedbacks.
You got a really useful blog I have been here reading for about an hour. I am a newbee and your success is very much an inspiration for me. Please come visit my site cross cultural psychology
when you got time.
Wow! Thank you! I always wanted to write in my site something like that. Can I take part of your post to my blog? Please come visit my site Baton Rouge Business Directory when you got time.
In response to multi-era, Burberry will also be designed to reach extends to other areas, such as men's burberry polo shirt, ladies burbrery polos and babolat aeropro drive. A period of change is not just for the Burberry company is celebrating challenges, and it also gives the same with a change in the company. Puma basket and ed hardy clothes is also a product of a new era, the younger generation on the Cheap puma shoes worship has reached frenzied proportions. Wholesale Handbagsand cheap tennis racquets have become a way of reducing consumption cost.
James Dean wore in the blockbuster film Levi jeans and Lacoste polo shirts,Levis Jeans from a "personality and rebellion" symbol. While their counterparts in another company to do jeans Diesel, it has more than the Diesel Jeans another culture, fashion spokesperson. In a series of autumn and winter are filled with a thick knighthood, while below it a cheap jacke is the best proof of this. puma basket t with a nike max shoes make you full of self-confidence. ed hardy shirts make you feel the freedom of street culture with nature.
What a blog filled with vital and important information this is .. It must have taken a lot of hours for you to write these yourself. Hats off from me for your hard work. Please come visit my site auto repair give me any valuable feedbacks.
That is some inspirational stuff... Never know that opinions could be this varied. Thanks for all the enthusiasm to offer such helpful information here. Please come visit my site logistics when you got time.
nike air max 90
nike air max 95
nike air max tn
nike air rift
nike shox r4
nike air max 360
nike shox nz
puma mens shoes
puma shoes
puma speed
nike shoes
nike air
nike air shoes
puma cat
air max trainers
mens nike air max
nike shoes air max
nike shoes shox
air shoes
nike shoe cart
puma future
cheap puma
sports shoes
nike air rifts
nike air rift trainer
nike air
nike rift
nike rift shoes
cheap nike air rifts
bape shoes
jeans shop
diesel jeans
levis jeans
Great tips. I am new to business, trying to visit more business blogs for guides and tips.
You can be friends with me. I would love some feedback on my siteboot repair when you got time. Thanks.
Nice blog design. This seems like it would be a very interesting blog to keep up with. Please come visit my site health care when you got time.
sneakers shoes She continued,
"Why...? Don't you need someone to pose as your girlfriend this year?" Then he answered, "No, there is no need for that anymore......"
Before he can continue, he was interrupted, discount nike shoes"Oh yes! Must have found a girlfriend! nike shox r4 You haven't been searching for one for the past years, right?" The man looked up, as if he has struck gold, his face beamed and looked directly at the drunken girl. tn dollarHe replied, "Yes......you are right! I haven't been looking for anyone for the past years."
With that, the man darted across the floor and out the door, cheap nike shoesleaving the lady in much bewilderment. He finally realized that he has already found his dream girl, and she was.....the Vancouver girl all along! The drunken lady has said something that awoken him.
All along he has found his girl.nike tennis shoes That was why he did not bother to look further when he realized she was not coming back. It was not any specific girl he was seeking! cheap nike shoxIt was perfection that he wanted, and yes.....perfection!!
Relationship is something both parties should work on. Realizing that he had let away someone so important in his life, he decided to call her immediately. His whole mind was flooded with fear.free shipping shoes He was afraid that she might have found someone new or no longer had the same feelings anymore..... For once, he felt the fear of losing someone.
As it was Christmas eve, the line was quite hard to get through, especially an overseas call. He tried again and again, never giving up. Finally, he got through......precisely at 1200 midnight. He confessed his love for her and the girl was moved to tears. nike shoes It seemed that she never got over him! Even after so long, she was still waiting for him, never giving up.
He was so excited to meet her and to begin his new chapter of their lives. He decided to fly to Vancouver to join her. It was the happiest time of their lives! nike discount shoes But their happy time was short-lived. Two days before he was supposed to fly to Vancouver,cheap puma shoes he received a call from her father. She had a head-on car collision with a drunken driver. nike shox shoes She passed away after 6 hours in a coma.
The guy was devastated, as it was a complete loss. Why did fate played such cruel games with him? He cursed the heaven for taking her away from him, denying even one last look at her! How cruel he cursed! How he damned the Gods...!!nike free shoes How he hated himself....for taking so long to realize his mistake!! That was in 1996.
The moral of this story is :
Treasure what you have...
Time is too slow for those who wait;
Too swift for those who fear;
Too long for those who grief;
Too short for those who rejoice;
But for those who love...
Time is Eternity.
For all you out there with someone special in your heart, cherish that person, cherish every moment that you spend together that special someone, for in life, anything can happen anytime. buy shoes onlineYou may painfully regret, only to realise that it is too late.
I was thinking of looking up some of them newspaper websites, but am glad I came here instead. Although glad is not quite the right word… let me just say I needed this after the incessant chatter in the media, and am grateful to you for articulating something many of us are feeling - even from distant shores. Please come visit my site Business Reviews Of Oakland City when you got time.
I was thinking of looking up some of them newspaper websites, but am glad I came here instead. Although glad is not quite the right word… let me just say I needed this after the incessant chatter in the media, and am grateful to you for articulating something many of us are feeling - even from distant shores. Please come visit my site Business Directory Listings Of Santa Ana California CA when you got time.
This seems like it would be a very interesting blog to keep up with. Please come visit my site Arts & Humanities give me any valuable feedbacks.
I can see that you are putting a lot of time and effort into your blog and detailed articles! I am deeply in love with every single piece of information you post here. Will be back often to read more updates! Please come visit my site business directory when you got time.
You owe a very nice and interesting blog. Please come visit my site Local Business Directory Of Memphis U.S.A. when you got time.
You owe a very nice and interesting blog. Please come visit my site Memphis City Directory when you got time.
I usually don’t leave comments!!! Trust me! But I liked your blog…especially this post! Would you mind terribly if I put up a backlink from my site to your site? Please come visit my site Indianapolis Community Video Library when you got time.
I usually don’t leave comments!!! Trust me! But I liked your blog…especially this post! Would you mind terribly if I put up a backlink from my site to your site? Please come visit my site Indianapolis Business Phone Numbers when you got time.
I just love it ..... well i don't have any doubt about your articles... your articles are awesome... Honestly you are simply the best.Thanks for sharing this with us. Please come visit my site Home remedies when you got time.
Good tips that are worth checking and these tips are also worth suggesting to friends. Thanks for sharing. Great stuff! . I am new to seo, trying to visit more seo blogs for guides and tips. You can be friends with me. Please come visit my site Children's health when you got time. Thanks.
Herniated disc symptoms, Herniated Disc Pain, treatment and developmental online guide. herniated disc treatment
Symptoms bronchitis, bronchitis chronic, bronchitis treatment.Bronchitis
Thank you for your good humor and for allowing yourself to be convinced that this was the right show for you to work on. Please come visit my site California CA Phone Directory when you got time.
Thank you for your good humor and for allowing yourself to be convinced that this was the right show for you to work on. Please come visit my site Anaheim Phone Book when you got time.
Organic weed control and lawn weed control guides and tips. Include gardening basic information and soil and composting resources.organic weed control
Online resources of toe fungus, fungus treatment, nail fungus toe. foot fungus
Post a Comment