01 May 2006

Getting Started in NLP

Since starting the blog, a few people have asked me how one can get started in NLP, while residing in a department lacking NLP researchers. This is a difficult question: I fell into NLP quite naturally when I was at CMU and made an easy transition to grad school at USC, both of which have awesome NLP groups. Lacking such internal support, one has to be much more ambitious to get to the point where one could do real research in the field. The obvious avenues for support are: reading books (which ones?) and papers (from where and by whom?), going to nearby conferences (which ones?) and experimentation (on what?). (New option: read and post to this blog!)

The four standard books in the field are Statistical NLP (Manning + Schutze), Speech and Language Processing (Jurafsky + Martin), Statistical Language Learning (Charniak) and Natural Language Understanding (Allen). The latter two are much older, though some people prefer Charniak to Manning + Schutze. I would probably pick up Manning + Schutze if I could only buy one. From this book, I think that skimming Chapters 1, 4, 6 and 13 should give a reasonable (but not uniformly sampled) representation of background knowledge everyone should know. Unfortunately, this misses many topics: information extraction, summarization, question answering, dialog systems, discourse, morphology, ontologies, pragmatics, semantics, sentiment analysis and textual entailment.

Finding good papers for beginners is hard. Without guidance, skimming titles and abstracts of papers published in ACL, NAACL, HLT or COLING since 2002 or 2003 should enable someone to find out what looks interesting to them. I know many advisors take this approach with new students. The ACL anthology is great for finding old papers. I'll probably post at a later date about what are the "must reads" for the areas I know best. Once you've found a few papers you like, I'd check out the respective author's web pages and see if they have any related work (best bet is probably to look at the advisor's page: often s/he will have multiple students working on similar topics). Also, advisor's often have course material and slides from tutorials: these are great places to get introductory-level material.

If you happen to get lucky (I never have) and one of the above conferences is located nearby, I'd just go. Presentations of papers (if they're good) are often better from the perspective of getting the high-level overview than the papers themselves, since papers have to be technically complete.

I'm perhaps overcommitting myself, given my promise to talk more about structured prediction, but over the next few weeks/months, I'll work on a "Getting Starting in X" series. X will likely range over the set { summarization, sequence labeling, information extraction, machine translation, language modeling }. Requests for other topics will be heard, keeping in mind I'm not an expert in many areas.

7 comments:

Anonymous said...

This is great stuff. My copy of Manning and Shutze actually just arrived this past Saturday. I'm looking forward to your "must reads" list.

Anonymous said...

Very useful !! Thanks

Anonymous said...

Maybe someone will be insterested in this reading list:
http://ciir.cs.umass.edu/~fuchun/readlist_all/readlist/readlist.html

Anonymous said...

I'd recommend my Amazon list: Computational Linguistics Syllabus.

Manning and Schuetze is heavily biased toward problems that are easy using statistical NLP. And it's now out of date in the sense that most of the current techniques used in papers at ACL conferences are not covered. There are also a lot of typos -- check out their site for a list of errata. I find Jurafsky and Martin a better overall intro, especially when combined with the awesome statistical learning book by Hastie et al.

Anonymous said...

For those of you with a membership to MIT CogNet, you should be able to read/download all of Manning & Schütze in digital form here.

There's also a pretty thorough companion website for the book, and lots of relevant slides linked from the syllabus to Chris's NLP class.

PS: I love the blog -- I look forward to your updates!

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

Anonymous said...

自慰套,真愛密碼,
自慰套,自慰器,自慰套,情趣,充氣娃娃,
性感丁字褲,AV,按摩棒,電動按摩棒,情趣按摩棒,
角色扮演,角色扮演服,吊帶襪,丁字褲,飛機杯,

按摩棒,變頻跳蛋,跳蛋,無線跳蛋,G點,
潤滑液,SM,情趣內衣,內衣,性感內衣,情趣用品,情趣,