tag:blogger.com,1999:blog-19803222.post8684805379782673145..comments2024-03-18T01:45:45.724-06:00Comments on natural language processing blog: What's the Use of a Crummy Translation?halhttp://www.blogger.com/profile/02162908373916390369noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-19803222.post-66065746712178442002010-05-31T07:28:59.781-06:002010-05-31T07:28:59.781-06:00jumsgcxReally trustworthy blog. Please keep updati...jumsgcxReally trustworthy blog. Please keep updating with great posts like this one. I have booked marked your site and am about to email it to a few friends of mine that I know would enjoy reading..<br /><a href="http://www.seslidetek.com" title="sesli sohbet" rel="nofollow">sesli sohbet</a><a href="http://www.seslidetek.com" title="sesli chat" rel="nofollow">sesli chat</a><br /><a href="http://www.seslidetek.com" title="sesli sohbet siteleri" rel="nofollow">sesli sohbet siteleri</a><br /><br /><a href="http://www.seslidetek.com" title="sesli chat siteleri" rel="nofollow">sesli chat siteleri sesli sohbetsesli chat<br />sesli sohbet siteleri<br />sesli chat siteleri<br />SesliChat<br />cılgın sohbet<br />güzel kızlar<br />bekar kızlar<br />dul bayanlar<br />seviyeli insanlar<br />yarışma<br />canlı müzik<br />izdivac<br />en güzel evlilik<br />hersey burada<br />sesliparti<br />seslisohbet odalari<br />Sesli adresi<br />Sesli Chat<br />SesliChat Siteleri<br />Sesli Chat sitesi<br />SesliChat sitesi<br />SesliSohbet<br />Sesli Sohbet<br />Sesli Sohbet Sitesi<br />SesliSohbet Sitesi<br />SesliSohbet Siteleri<br />Muhabbet Sitesi<br />kamerali chat<br />Görüntülü Sohbet<br />Hasret gülleri<br />Çet sitesi<br />SesliSohbet<br />Sesli Sohbet<br />Canli sohbet<br />Turkce sohbet<br />Kurtce Sohbet<br />Kurtce Chat<br />Kurtce Muhabbet<br />Kurtce Sohbet<br />Kurdish Chat<br />SesliChat<br />Sesli Chat<br />SesliSanal<br />Guncel Haber<br />sohbet Sitesi<br />Chat sitesi..</a>kAriZmA - Www.SesliDesin.Comhttps://www.blogger.com/profile/14832823343464433448noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-3346567018662801842009-05-12T10:48:00.000-06:002009-05-12T10:48:00.000-06:00酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒...酒店經紀PRETTY GIRL <A HREF="http://www.taipeilady.com/" REL="nofollow" TITLE="台北酒店經紀人">台北酒店經紀人</A> ,<A HREF="http://tw.myblog.yahoo.com/jw!qZ9n..6QEhhc0LkItOBm/" REL="nofollow" TITLE="禮服店">禮服店</A> 酒店兼差PRETTY GIRL<A HREF="http://www.mashow.org/" REL="nofollow" TITLE="酒店公關">酒店公關</A> 酒店小姐 彩色爆米花<A HREF="http://blog.xuite.net/jkl338801/blog/" REL="nofollow" TITLE="酒店兼職">酒店兼職</A>,酒店工作 彩色爆米花<A HREF="http://tw.myblog.yahoo.com/jw!BIBoU5SeBRs21nb_ajFpncbTqXds" REL="nofollow" TITLE="酒店經紀">酒店經紀</A>, <A HREF="http://mypaper.pchome.com.tw/news/thomsan/3/1310065116/20080905040949/" REL="nofollow" TITLE="酒店上班">酒店上班</A>,酒店工作 PRETTY GIRL<A HREF="http://tw.myblog.yahoo.com/jw!rybqykeeER6TH3AKz1HQ5grm/" REL="nofollow" TITLE="酒店喝酒">酒店喝酒</A>酒店上班 彩色爆米花<A HREF="http://mypaper.pchome.com.tw/news/jkl338801/" REL="nofollow" TITLE="台北酒店">台北酒店</A>酒店小姐 PRETTY GIRL<A HREF="http://www.mashow.org/" REL="nofollow" TITLE="酒店上班">酒店上班</A>酒店打工PRETTY GIRL<A HREF="http://www.tpangel.com/" REL="nofollow" TITLE="酒店打工">酒店打工</A>酒店經紀 彩色爆米花Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-45596581663974182012007-08-14T17:08:00.000-06:002007-08-14T17:08:00.000-06:00Some rumblings...Are those texts translations? Wha...Some rumblings...<BR/><BR/>Are those texts translations? What is a translation for you? The work of translation is more often than not underestimated, thought of as a trivial task and it is frequently miserably payed. People start to react about what a translation is when something has to be done with the product apart from laughing.<BR/>Few people would say something is a car if it never moves. People are too used to just taking a look at translations and giving up and trying to understand by themselves through context or else...until they have to deal with longer messages that cannot be guessed from the look of a machine.<BR/>MT has been improving a lot. Still, many endeavors in this area would progress faster if people would be humble enough as to ask what AI people have so often failed to ask:<BR/>what is our general theory of this? (in this case, general theory of translation)<BR/>The Turin test for AI was a bad premise. A parrot can talk and often fool and yet few people would say it is very intelligent. The Turin test and the reluctance to think about what intelligence really is has lead to lots of nice gadgets but too little advances for the efforts in AI.<BR/>In the same way as many AI people in general have often failed to sit still for a moment and think of a theory of mind (remember On Intelligence, by Jeff Hawkins?), many people in NLP have failed to ask firstly what their theory of the language is (and a theory of the language needs to be more than a chosen formalism).<BR/><BR/>What do we expect from MT? To surpass the work of people who are doing their best at a work they are not capable of but for which most companies do not want to pay enough? Do we want the system to do some kind of understanding? (very high level, and then define understanding)<BR/>Or do we take some average path and<BR/>decide to go for a system that can render the intended meaning of most of the sentences? Say 70, 80%?<BR/>How robust? In what text fields?<BR/><BR/>Andrés<BR/><BR/><A HREF="http://www.crossminder.com" REL="nofollow"><BR/>Crossminder<BR/></A>Andres Dominguezhttps://www.blogger.com/profile/03357906785806200754noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-29388750195901051392007-07-25T13:32:00.000-06:002007-07-25T13:32:00.000-06:00I'm not a big fan of complexly weighted utility me...I'm not a big fan of complexly weighted utility metrics. If you can motivate one with a task, fair enough.<BR/><BR/>Weighting by IDF would be similar to macro-average results for classifiers (metric is average over types, not tokens). <BR/><BR/>What you need will depend on task. For "needle in a haystack" kinds of text mining, you need good recall on items not in the training set. For "what do people think of the iPhone", it's much easier to get an answer because the signal is hugely redundant. I'd want high recall for the former and high precision for the latter, most likely.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-26874203032977122802007-07-24T03:52:00.000-06:002007-07-24T03:52:00.000-06:00Bob, do you think one should weight accuracy calcu...Bob, do you think one should weight accuracy calculation by e.g. the TF/IDF score of the word, to get a more relevant accuracy measure?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-85027231757850082852007-07-21T11:30:00.000-06:002007-07-21T11:30:00.000-06:00I love how SMT systems omit a negative every now a...I love how SMT systems omit a negative every now and then, translating the phrase as the exact opposite of its original meaning. Idiomatic expressions and colloquialisms are often a great source of entertainment as well.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-57558157219620996352007-07-20T12:44:00.000-06:002007-07-20T12:44:00.000-06:00It's always nice to have a low baseline. A partic...It's always nice to have a low baseline. A particularly useful baseline for speech systems is call center attendant performance. The attendants aren't dumb, they're just stressed from too little time to make a decision (typically under 20 seconds), too little training about the business logic (typically hundreds of destinations, with half a day training and a 3-ring binder), and too little experience (typically under six months).<BR/><BR/>It really becomes a cost issue. How much are you willing to pay for a good translation? For the Chinese tour bus companies, not much. How much are companies willing to pay for telephone support? Again, not much.<BR/><BR/>Just don't confuse baselines with toplines. People can do call routing and hotel sign translation at near 100% accuracy. <BR/><BR/>The bigger issue is that all of NLP is crummy. 90% entity extraction precision means developing systems to deal with errors in 1/10 low level decisions. (Not to mention developing whole new systems to deal with the lack of recall.) 97% tagger accuracy still means one word in a sentence is likely wrong; not coincidentally that word's likely to be the most discriminative one in the sentence (in the TF/IDF sense), such as a noun or adjective, rather than a functional word like "the", which is (almost) always tagged correctly, padding the accuracy stats.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-36293986576169333532007-07-20T08:21:00.000-06:002007-07-20T08:21:00.000-06:00Excellent post Hal. My company is the group that D...Excellent post Hal. My company is the group that Dave Novakvic was referring to developing Asian SMT systems and what you present in this entry is the exactly what we subscribe to. <BR/><BR/>I previously worked as VP and Research Director for Gartner in Asia Pacific and one of the messages I often tried to get across to people looking for perfection was that it often was not necessary. Good enough for the current task is what is necessary. Many businesses are using Wiki's, blogs and other tools internally that are far from perfect but useable and good enough for the purpose they are being used. Sure, they could be better and offer more, but they are good enough. <BR/><BR/>MT is the same and the examples you pointed out are as good as any. MT has improved greatly and new techniques and new resources are making MT better every day. <BR/><BR/>Some of the languages we are working on are difficult to deal with programatically, such as Thai. Thai does not have spaces, punctuation, periods or anything useful to determine words, paragraphs, sentences etc. I have a sentence that runs 27 pages - try feeding that through a SMT system :) Research has been limited to date because of basic low level tech being hard to master to get to the same point where most other languages start at. <BR/><BR/>We will have our first Thai system up for demo in 2-3 weeks from now. It will be far from perfect, but it will be better than any of the other limited Thai translation systems today based on rules and it will be "good enough." Sure, there will be improvements over time and we are already working on some, but there is also a point of diminishing returns. <BR/><BR/>We are training on multiple domains with corpus sizes exceeding 10 million sentence pairs. It will be an interesting language to monitor to see how it stacks up in SMT and what the point of diminishing returns is for domains. <BR/><BR/>As I am sure you are aware, gathering corpus is not fun, time consuming and expensive - that point of being "good enough" is going to be key for us. At this stage, if key messages are clearly presented, even if the grammar is a little off, then that to me is "good enough". <BR/><BR/>Future enhancements such as syntax trees and applying morphological data into some of the processes will likely give us greater quality than going too far on corpus. <BR/><BR/>BTW, my personal favorite from China is "Passage of deformed man". The Chinese read "wheelchair ramp" - that one was not quite "good enough"<BR/><BR/>Regards<BR/><BR/>DionDion Wigginshttps://www.blogger.com/profile/06923497470039599497noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-58761243185660444142007-07-20T00:29:00.000-06:002007-07-20T00:29:00.000-06:00Machine verses people. Not sure if I get that. Whi...Machine verses people. Not sure if I get that. Whilst I agree with the concept of MT I don't feel nor do I subscribe to its use commercially. Surely what we should all be looking at is contextual memory and the relationship of language to market and market language to brand or business or sector. Systems are already emerging that use artificial intelligence and are developing faster than MT. Ultimately the quality of our language as a communications medium is going to be the demonstrable proof that we can communicate with our fellow man or woman in his country and for his business.malford385https://www.blogger.com/profile/12805671169963902266noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-16906405891531324312007-07-19T19:27:00.000-06:002007-07-19T19:27:00.000-06:00John: SMT has long been overlooked as a viable MT ...John: SMT has long been overlooked as a viable MT method simply because of the prohibitively high cost of processing. I'm guessing that would be a limitation on embedded devices too. Hopefully not for long though.<BR/><BR/>disclaimer: I don't work for these guys, but i'm very interested and excited in what they are doing.<BR/>I got a little more info about the guys specialising in Asian SMT:<BR/>1. they currently are working on over 100 high quality Asian language pairs. with many more to come.<BR/>2. have support for domain specific translations (thus resulting in a much higher quality translation)<BR/>3. will have a Thai beta translation done soon. Thai is an extremely hard language to translate because of it's lack of spaces, periods and attention :)<BR/><BR/>4.they are looking for computational linguists from Asian countries to help with pre and post processing of their translations. So if you have experience in the area get in <A HREF="http://dpn.name/index.php/contact/" REL="nofollow">contact with me</A> and I'll give you a contact. Or sign up for their mailing list at <A HREF="http://www.asiaonline.net/" REL="nofollow">asiaonline.net</A>.David Novakovichttps://www.blogger.com/profile/14193067337289063998noreply@blogger.comtag:blogger.com,1999:blog-19803222.post-30726791605297830722007-07-19T09:15:00.000-06:002007-07-19T09:15:00.000-06:00Let's see how Google Translate performs in these c...Let's see how Google Translate performs in these cases:<BR/><BR/>此優惠券不可兌換現金,而且只可使用一次。<BR/>This Coupon non-convertible cash, but can only be used once.<BR/><BR/>管理層保留最終解釋權。<BR/>Management retain the final power of interpretation.<BR/><BR/>(When the concept of "final" is removed, as in the original sign:<BR/>管理層保留解釋的權利。<BR/>Management explained to retain the rights.)<BR/><BR/>辦公室零食不可拿走。<BR/>Office snack foods can not take away.<BR/><BR/>請回收汽水瓶。<BR/>Please recall the bottle. <BR/><BR/>(In the word order of the original sign: <BR/>汽水瓶請回收。<BR/>Please bottle recycling.)<BR/><BR/>Hmm. It works okay.<BR/><BR/>This reminds me of the electronic English-Chinese dictionary that I used to have. It worked great for looking up words. It would be cool if it could do simple MT as well. Now some electronic dictionaries can already do simple MT, e.g. <A HREF="http://translate.google.com/translate?u=http%3A%2F%2Fwww.instant.com.tw%2Finstantdir%2Fnews%2Fnews_detail.asp%3Fnews_id%3D50&langpair=zh%7Cen&hl=en&ie=UTF8" REL="nofollow">this one</A>. I'm not sure about the technology underlying this or how well it performs, but I wonder what current <I>statistical</I> MT technologies can do in a hand-held device like this.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-19803222.post-75919474286819863182007-07-19T05:28:00.000-06:002007-07-19T05:28:00.000-06:00There is some very promising work being done on SM...There is some very promising work being done on SMT for asian languages, I wish I could say more. But needless to say there are leaps and bounds being made in the field. There are different ways of thinking about SMT, and google's blue scores are just not quite there, there are ways to get much better translations using the same techniques.Obviously I don't know the internals of Googles SMT tech, but the differences run to a very fundamental level. I regret having to be so cryptic, i respect my agreements :) It does have something to do with this though: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.htmlDavid Novakovichttps://www.blogger.com/profile/14193067337289063998noreply@blogger.com