25 September 2009

Some notes on job search

There are tons of "how to apply for academic jobs" write-ups out there; this is not one of them. It's been four years (egads!) since I began my job search and there are lots of things I think I did well and lots of things I wish I had done differently.

When I entered grad school, I was fairly sure that I eventually wanted a university job. During high school, my career goal was to be a high school math teacher. Then I went to college and realized that, no, I wanted to teach math to undergraduates. Then I was an advanced undergraduate and realized that I wanted to teach grads and do research. Teaching was always very important to me, though of course I fell in love with research later. It was unfortunate that it took so long for me to actually get involved in research, but my excuse was that I wasn't in CS, where REU-style positions are plentiful and relatively easy to come by (system development, anyone?).

However, the more time I spend in grad school, including an internship at MSR with Eric Brill (but during which I befriended many in the NLP group at MSR, a group that I still love), I realized that industry labs were a totally great place to go, too.

I ended up applying to basically everything under the sun, provided they had a non-zero number of faculty in either NLP or ML. I talked (mostly off the record) with a few people about post-doc positions (I heard later than simultaneously exploring post-docs and academic positions is not a good idea: hiring committees don't like to "reconsider" people; I don't know how true this is, but I heard it too late myself to make any decisions based on it), applied for some (okay, many) tenure-track positions, some research-track positions (okay, few) and to the big three industry labs. I wrote three cover letters, one more tailored to NLP, one more to ML and one more combined, three research statements (ditto) and one teaching statement. In retrospect, they were pretty reasonable, I think, though not fantastic. I don't think I did enough to make my future research plans not sound like "more of the same."

I suppose my biggest piece of advice for applying is (to the extent possible) find someone you know and trust at the institution and try to figure out exactly what they're looking for. Obviously you can't change who you are and the work you've done, but you definitely can sell it in slightly different ways. This is why I essentially had three application packages -- the material was the same, the focus was different. But, importantly, they were all true. The more this person trusts you, the more of the inside scoop they can give you. For instance, we had a robotics/ML position open (which, sadly, we had to close due to budget issues), but in talking to several ML people, they felt that they weren't sufficiently "robotics" enough; I think I was able to dissuade them of this opinion and we ended up getting a lot of excellent applicants before we shut down the slot.

Related, it's hard to sell yourself across two fields. At the time I graduated, I saw myself as basically straddling NLP and ML. This can be a hard sell to make. I feel in retrospect that you're often better off picking something and really selling that aspect. From the other side of the curtain, what often happens is that you need an advocate (or two) in the department to which you're applying. If you sell yourself as an X person, you can get faculty in X behind you; if you sell yourself as a Y person, you can get faculty in Y behind you. However, if you sell yourself as a mix, the X faculty might prefer a pure X and the Y faculty might prefer a pure Y. Of course, this isn't always true: Maryland is basically looking for a combined NLP/ML person this year to compliment their existing strengths. Of course, this doesn't always hold: this is something that you should try to find out from friends at the places to which you're applying.

For the application process itself, my experience here and what I've heard from most (but not all) universities is that interview decisions (who to call in) get made by a topic-specific hiring committee. This means that to get in the door, you have to appeal to the hiring committee, which is typically people in your area, if it's an area-specific call for applications. Typically your application will go to an admin, first, who will filter based on your cover letter to put you in the right basket (if there are multiple open slots) or the waste basket (for instance, if you don't have a PhD). It then goes to the hiring committee. Again, if you have a friend in the department, it's not a bad idea to let them know by email that you've applied after everything has been submitted (including letters) to make sure that you don't end up in the waste bin.

Once your application gets to the hiring committee, the hope is that they've already heard of you. But if they haven't, hopefully they've heard of at least one of your letter writers. When we get applications, I typically first sort by whether I've heard of the applicant, then by the number of letter writers they have that I've heard of, then loosely by the reputation of their university. And I make my way down the list, not always all the way to the bottom. (Okay, I've only done this once, and I think I got about 90% of the way through.)

In my experience, what we've looked for in applications is (a) a good research statement, including where you're going so as to distinguish yourself from your advisor, (b) a not-bad teaching statement (it's hard to get a job at a research university on a great teaching statement, but it's easy to lose an offer on a bad one... my feeling here is just to be concrete and not to pad it with BS -- if you don't have much to say, don't say much), (c) great letters, and (d) an impressive CV. You should expect that the hiring committee will read some of your papers before interviewing you. This means that if you have dozens, you should highlight somewhere (probably the research statement) what are they best ones that they should read. Otherwise they'll choose essentially randomly, and (depending on your publishing style) this could hurt. As always, put your best foot forward and make it easy for the hiring committee to find out what's so great about you.

Anyway, that's basically it. There's lots more at interview stage, but these are my feelings for application stage. I'd be interested to hear if my characterization of the hiring process is vastly different than at other universities; plus, if there are other openings that might be relevant to NLP/ML folks, I'm sure people would be very pleased to seem them in the comments section.

Good luck, all your graduating folks!

09 September 2009

Where did you Apply to Grad School?

Ellen and I are interested (for obvious reasons) in how people choose what schools to apply to for grad school. Note that this is not the question of how you chose where to go. This is about what made the list of where you actually applied. We'd really appreciate if you'd fill out our 10-15 minute survey and pass it along to your friends (and enemies). If you're willing, please go here.

07 September 2009

ACL and EMNLP retrospective, many days late

Well, ACL and EMNLP are long gone. And sadly I missed one day of each due either to travel or illness, so most of my comments are limited to Mon/Tue/Fri. C'est la vie. At any rate, here are the papers I saw or read that I really liked.
  • P09-1010 [bib]: S.R.K. Branavan; Harr Chen; Luke Zettlemoyer; Regina Barzilay
    Reinforcement Learning for Mapping Instructions to Actions


    P09-1011 [bib]: Percy Liang; Michael Jordan; Dan Klein
    Learning Semantic Correspondences with Less Supervision

    these papers both address what might roughly be called the grounding problem, or at least trying to learn something about semantics by looking at data. I really really like this direction of research, and both of these papers were really interesting. Since I really liked both, and since I think the directions are great, I'll take this opportunity to say what I felt was a bit lacking in each. In the Branavan paper, the particular choice of reward was both clever and a bit of a kludge. I can easily imagine that it wouldn't generalize to other domains: thank goodness those Microsoft UI designers happened to call the Start Button something like UI_STARTBUTTON. In the Liang paper, I worry that it relies too heavily on things like lexical match and other very domain specific properties. They also should have cited Fleischman and Roy, which Branavan et al did, but which many people in this area seem to miss out on -- in fact, I feel like the Liang paper is in many ways a cleaner and more sophisticated version of the Fleischman paper.

  • P09-1054 [bib]: Yoshimasa Tsuruoka; Jun’ichi Tsujii; Sophia Ananiadou
    Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

    This paper is kind of an extension of the truncated gradient approach to learning l1-regularized models that John, Lihong and Tong had last year at NIPS. The paper did a great job at motivated why L1 penalties is hard. The first observation is that L1 regularizes optimized by gradient steps like to "step over zero." This is essentially the observation in truncated gradient and frankly kind of an obvious one (I always thought this is how everyone optimized these models, though of course John, Lihong and Tong actually proved something about it). The second observation, which goes into this current paper, is that you often end up with a lot of non-zeros simply because you haven't run enough gradient steps since the last increase. They have a clever way to accumulating these penalties lazily and applying them at the end. It seems to do very well, is easy to implement, etc. But they can't (or haven't) proved anything about it.

  • P09-1057 [bib]: Sujith Ravi; Kevin Knight
    Minimized Models for Unsupervised Part-of-Speech Tagging

    I didn't actually see this paper (I think I was chairing a session at the time), but I know about it from talking to Sujith. Anyone who considers themselves a Bayesian in the sense of "let me put a prior on that and it will solve all your ills" should read this paper. Basically they show that sparse priors don't give you things that are sparse enough, and that by doing some ILP stuff to minimize dictionary size, you can get tiny POS tagger models that do very well.

  • D09-1006: [bib] Omar F. Zaidan; Chris Callison-Burch
    Feasibility of Human-in-the-loop Minimum Error Rate Training

    Chris told me about this stuff back in March when I visited JHU and I have to say I was totally intrigued. Adam already discussed this paper in an earlier post, so I won't go into more details, but it's definitely a fun paper.

  • D09-1011: [bib] Markus Dreyer; Jason Eisner
    Graphical Models over Multiple Strings

    This paper is just fun from a technological perspective. The idea is to have graphical models, but where nodes are distributions over strings represented as finite state automata. You do message passing, where your messages are now automata and you get to do all your favorite operations (or at least all of Jason's favorite operations) like intersection, composition, etc. to compute beliefs. Very cool results.

  • D09-1024: [bib] Ulf Hermjakob
    Improved Word Alignment with Statistics and Linguistic Heuristics

    Like the Haghighi coreference paper below, here we see how to do word alignment without fancy math!

  • D09-1120: [bib] Aria Haghighi; Dan Klein
    Simple Coreference Resolution with Rich Syntactic and Semantic Features

    How to do coreference without math! I didn't know you could still get papers accepted if they didn't have equations in them!
In general, here's a trend I've seen in both ACL and EMNLP this year. It's the "I find a new data source and write a paper about it" trend. I don't think this trend is either good or bad: it simply is. A lot of these data sources are essentially Web 2.0 sources, though some are not. Some are Mechanical Turk'd sources. Some are the Penn Discourse Treebank (about which there were a ridiculous number of papers: it's totally unclear to me why everyone all of a sudden thinks discourse is cool just because there's a new data set -- what was wrong with the RST treebank that it turned everyone off from discourse for ten years?! Okay, that's being judgmental and I don't totally feel that way. But I partially feel that way.)