I feel it is increasingly interesting to look at problems that have both a linguistic and a non-linguistic signal. The simplest example I can think of here is recent work in looking at images and their captions (eg., the matching words and pictures approach, though this is not the only such example). The idea in many of these techniques is that we have a single data set that contains two types of data. One is language (eg., the captions) and the other is something else (eg., the images). There are many other potentially interesting sources of data like this that I'll discuss after saying why I think these problems are interesting.
I think there are two basic reasons why such "dual" problems are interesting:
- By looking at multiple modalities, we can get cross-modality reinforcement of what we're learning. I think a great example of that is this paper, which uses face recognition together with textual coreference in order to do unsupervised coref in captions. The idea is that if you can tell that two different pictures contain the entity we call "Bill Clinton" in them, then it's more likely that a name in their caption will corefer. This gives us a greater ability to share data across a single data set.
- When language is treated in the context of some other source of information, we get a sort of grounding effect. That is, continuing the coref example from #1, we have--in some sense--grounded the string "Bill Clinton" to a pictorial representation of a person. Sure, this representation is still just a bunch of symbols, but they're markedly different symbols from the ones we use to represent the entity in purely linguistic terms. Perhaps hard-core cognitive scientists wouldn't call this grounding, but it's getting there. (Also along these lines, one of my favorite papers from a few years ago was my ex-ISI friend Mike doing grounding by looking and language and action in video games.)
It seems that vision+language is the hot topic, or at least the warmest of the bunch. This is probably because vision people and language people tend to overlap and meet at machine learning conferences. It's probably also because the sorts of techniques used by the two communities are perhaps the most similar. But I think there's lots of room for looking at other "X"s. For instance, biology. There are lots of data sets (eg., GEO) that contain textual information ("these are cell lines depicting ovarian cancer") plus the actual cell lines. Heck, many papers in PubMed contain such information, albeit in figures rather than matrices. Robotics is another option. Ray Mooney's group down in Texas has worked at understanding orders given to robocup competitors based on language information (eg., this paper). Perhaps the oldest that actually lives within the NLP community is NLP + databases, which we really haven't seen much of in the past 5-10 years.
I think this is an interesting and fruitful area of future research and is one that I'll probably be exploring myself (but I won't give away what "X"s!).