Comments on natural language processing blog: Parsing with Transformations

It's a pity you guys don't have more conta...

2011-02-13T14:33:00.726-07:00

It's a pity you guys don't have more contact with computationally oriented psycholinguists (not me, I stepped out 2 years ago). The human mind doesn't do CKY or Earley, it also doesn't do transformations. There's no evidence for that. For the studies that found some effects of traces, other explanations exist.

Instead, there is a whole set of different ideas in psycholinguistics, but (I have to warn you) it's not ready for large scale parsing. If you want a good starting point, check out CogSci proceedings of the last few years.

If you can't be bothered with that, then also forget about transformations. They're just going to generate more alternatives than you care for.

Kids these days. No sense of history! There is ...

2011-02-10T12:13:42.998-07:00

Kids these days. No sense of history!

There is a vast literature from the 1980s and early 1990s in ACL, COLING, IWPT, CUNY Sentence Processing Workshop, etc., that addresses (a) resource bounded parsing, (b)left-to-right parsing, and (c) transformations.

Left-to-right parsing was the whole motivation for Steedman's CCG, for instance (though it's actually broken on the left-to-right front unlike the fully associative Lambek calculus (circa 1957), which can be proven left associative).

The first paper I ever wrote in stat NLP was an IWPT paper with Chris Manning on using left-corner parsers (trained on the treebank, natch) to do bounded-memory left-to-right parsing that was more natural than shift-reduce or pure top-down from a psychological perspective (e.g., it disfavors center embeddings, which grow the stack, but allows right branching on a bounded stack).

There's a whole tradition of using rich features that goes under the heading "history based parsing".

There's also a vast literature connecting transformational grammar parsing to formalisms like CG or GPSG. I'd first look at Ed Stabler's work along those lines.

As to our not doing CKY, how about our doing sub-linear parallel CKY? If you have n**6 processors, you can process in log(n) time. If you have n**3 processors you can process in linear time. The real argument is not that our brains don't have the processing power to do CKY, but that we're not actually very good parsers without semantic coherence in a real world context.

Bur really, why would you want to use transformations? Have you ever actually tried to write a transformational grammar?

Heilman & Smith's approach to learning tre...

2011-01-30T23:35:12.391-07:00

Heilman & Smith's approach to learning tree edit models for sentence pair tasks seems relevant (at a high level, anyway). They are not seeking to learn a transformational grammar, but they do discover ways of transforming one tree into another that correlate with the nature of the relation between them. Perhaps knowing something about the edits "licensed" by a type of sentence relation could help in parsing?

Hi everyone, I'm not trying to open already op...

2011-01-20T16:55:55.781-07:00

Hi everyone,
I'm not trying to open already opened doors but the question of parsing evaluation is really becoming more crucial than ever.
Whenever I read report on experiments stating that this parser performs better than one other using only one domain (wsj23) and one metric (parseval or LAS), I always wonder if I'm actually learning something about the capacity of a model to provide an analysis of an expressed linguistic fact (eventually learned or not) or about its capacity to optimize score for this or that metric.
Two years ago, at IWPT'09, there was a public discussion about this and then Mark Johnson said, using a tired cow boy voice, like "Guys, we're only trying to optimize [our parsers] to get higher parseval F-score. Not more, not less". The implicit being that higher evaluation score means:
1) Scientific knowledge improved => getting published
2) the intellectual approach has been validated both empirically and academically => providing the assurance if someone uses this for real world task he won't get fired for doing so (this point comes actually from Owen Rambow)

Of course, everyone is aware of that but that doesn't prevent us to chase the Score as much as we can..

I think that what is really needed at that stage is a multilingual, framework neutral, multidomain dataset that would provide a clear insight about what to use if one needs to parse unrestricted text for a given language.
I'm not certain about extrinsic evaluation (said parser performance in a syntaxe based mt system) as to fully compare the parsing component, all others modules have to be the same so at the end, we're just replacing one metric which one other which is likely to be optimized for after a couple of years (that was also said by Nizar Hasbah at sprml2010) but that should be tryied of course, first evaluation results will be really meaningful.

hal@ how about launching another survey on what parsers are used for (and which one) these days ?

Djamé

@yoav: In addition to papers like the one Hal poin...

2011-01-20T14:54:12.641-07:00

@yoav: In addition to papers like the one Hal points to (and the work of Laura Rimell and colleagues), I think what is called for is more extrinsic evaluation of parsing technology. An example of that is Miyao et al 2008 ACL paper (and follow up journal paper).

@yoav: Barbara Plank; Gertjan van Noord. Grammar-...

2011-01-20T07:52:40.350-07:00

@yoav: Barbara Plank; Gertjan van Noord. Grammar-Driven versus Data-Driven: Which Parsing System Is More Affected by Domain Shifts?

@hal: which paper?

2011-01-20T07:43:32.957-07:00

@hal: which paper?

@Yoav: Totally agree. But see the paper I pointed...

2011-01-20T06:44:34.735-07:00

@Yoav: Totally agree. But see the paper I pointed to in a previous for a potential way around this! Of course whether "ACL" cares about adaptation performance is another question, but I think they at least sort of maybe care.

(Incidentally, I think that it's not just machines that are "overfit" the the idiosyncrasies, but the annotators themselves, as well... have you ever read the treebank annotation guidelines???? Gives me nightmares!)

@hal,Emily: one superficial but substantial obstac...

2011-01-20T02:32:12.132-07:00

@hal,Emily: one superficial but substantial obstacle standing in way of integrating grammar-based and data-driven (=treebank) approaches is that of evaluation, or in other words 'proving to the "acl-community" that it works'.

Currently, the preferred (only?) evaluation metric is F-measure on the treebank, specifically on section 23. I suspect that every grammar-based parser would perform worse on this metric than a treebank-only parser.

This is not because the grammar-based parsers would provide bad analyses -- I believe that they'd in fact be much better in many respects, but they will probably fail to capture the many idiosyncrasies and ad-hoc annotations in the treebank. I further suspect that our current, ML based parsers get a lot of their performance advantage due to fitting these idiosyncracies very well.

@Emily: (I'm guessing Emily = Emily Bender???...

2011-01-18T19:06:46.682-07:00

@Emily: (I'm guessing Emily = Emily Bender???)

Great food for thought!

(a) I totally agree about the dichotomy. This is actually why I referred to the Treebank as the "stupid" Treebank, because one thing it's done is resulted in a very weird definition of syntax that I don't think anyone would really agree with: it's not "deep" enough to appeal to Chomskyans, but not functional/lexical enough to appeal to LFG or HPSG types. I'm actually a big fan of LFG, and really enjoyed the paper at the workshop by Barbara Plank and Gertjan van Noord on grammar- versus data-driven parsing in an adaptation setting. It was a very cool way to see that "fancy" linguistic syntax can help.

(b) I don't really want to have the "is linguistics a science" debate, but you're right. Maybe I'm giving people too much credit. I can't even read Chomskyan linguistics papers any more -- I've forgotten too much to understand them at all. But I think you actually could make it falsifiable by doing what linguists on the more empirical side are doing. That said, I also think that there's a false dichotomy between Chomskyan and empirical. As Sprout's challenge hinted, there's no reason you couldn't do both.

(c) Actually that's a great question: how to combine the knowledge in human-written grammars and data!

Thanks for the reply, Hal. I'm sorry I couldn...

2011-01-18T18:43:21.997-07:00

Thanks for the reply, Hal.

I'm sorry I couldn't attend the workshop in Uppsala. I suspect if I could have been there, though, even while enjoying it, I would have been frustrated by the same thing as I was at the RING session at COLING 2010, namely, the tendency for those members of the 'ACL community' who are interested in these things---even those who know better!---to perpetuate a false dichotomy where the only two choices for sources of knowledge in parsing (that is, parsing English) are transformational grammar (and its current mainstream descendants) and the PTB.

To reply to your point, some linguists (often those who brand themselves as theorists) do seem to be doing work that is not concerned with getting the details right. You give them credit for seeing the big picture. I'd be more cynical, and say that they are doing work that is not empirical (in the original sense), and at worst not even falsifiable. But there are linguists who care very much about getting the details right. (And many, though not all, of us use computers to help us do so.)

I do not mean to say that machine learning is not interesting. I think it could be very interesting indeed to use machine learning in the service of linguistic analysis. But to start with the PTB and a vague notion of transformations is to fail to start from the state of the art. There exist broad coverage grammars (and associated parsers) in multiple frameworks ---- some hand engineered, some tree-bank derived --- that include detailed analyses of questions, long-distance dependencies, and all kinds of wonderful constructions. Perhaps you could use data annotated with those grammars/parsers as a gold-standard in an experiment that starts with the PTB and a notion of transformations. But perhaps it would be more interesting to talk to the linguists behind those grammars and find out what they think is both missing from their accounts and difficult to get at through the manual analysis they have done so far.

@Mike: thanks for the pointer -- I didn't know...

2011-01-18T10:18:14.957-07:00

@Mike: thanks for the pointer -- I didn't know about his stuff!

@Kallerdis: indeed -- I had read that a while ago but hadn't thought about it recently. As far as I know, not much :).

@Emily: I agree and disagree :). I definitely think that it's unfortunate that one side (non-Chomskyan) tends to acknowledge the other, but not vice versa. That said, I at least personally got filled up with LFG stuff at CMU as an undergrad, and Roumy actually talked about TAG quite a bit during our Syntax class. But I think that's definitely the exception and I think it's unfortunate. I was ignoring stuff like LFG and HPSG in the post because when I talked about community acceptance, I really meant "ACL community" acceptance, which only once in a while has a smattering of these things.

I also agree with your assessment that things like transformation grammar are too imprecise to implement. And certainly things like CCG, LFG, etc., have gone a long way in this direction.

What I wonder is whether it's actually okay for things to be imprecise. This was actually sort of the thesis of my position talk at the Linguistics meets NLP workshop in Uppsala. Namely, that linguists are great at getting the big picture structure, but perhaps not so good at getting the low level details. But machine learning is rubbish at getting big picture structures, but really good at getting low level details. So perhaps it's actually enough to know that things like transformations exist (for whatever definition of "exist" you like -- you can read "exist" simply as "are potential description length reduction" if you want), and maybe some of the parameters that control these things (notion like C-command and the like). And then let machine learning figure out the details. Even though I ended up getting trounced a bit at the workshop, I still feel like this is a reasonable direction to go!

While I'm happy to see that you think adding l...

2011-01-18T08:39:32.098-07:00

While I'm happy to see that you think adding linguistic knowledge like this would be a good idea, I'm dismayed at the apparent presupposition that the only approach to parsing is treebank-based machine learning (or unsupervised machine-learning). The logical extreme of adding linguistic knowledge is to create a hand-engineered grammar. This is not impossible!

The actual computational problem with transformational grammar is not the metaphor of transformations, but that that theoretical work is too imprecise for implementation. But there are theoretical approaches to syntax which are precise enough (HPSG, LFG, CCG, ...). It's a major failing of linguistics instruction that these are not at least mentioned at institutions where they are not practiced.

That lack of precision is I think part of what is behind Sproat's challenge.

When reading this, I remembered Sproat's chall...

2011-01-18T07:10:36.478-07:00

When reading this, I remembered Sproat's challenge to the community to come up with a Chomsky-inspired parser. Does anyone know if it led to any results?

You might check out work by Sandiway Fong, both on...

2011-01-17T15:02:13.881-07:00

You might check out work by Sandiway Fong, both on parsing with transformations (in "government and binding" theory) and more recent stuff on the role of non-declaratives.

I remember having the same mixed feelings that you...

2011-01-17T13:51:59.022-07:00

I remember having the same mixed feelings that you describe when I took *my* syntax class, and I definitely agree that this kind of parsing could be really cool.

I am not certain about your motivation, though -- if all you want to do is parse questions or commands, there are many much easier ways of doing it.

Also, if we had a good model of transformations, we could potentially apply it in reverse on the treebank trees, and get all sorts of trees (including questions and commands but all the "other stuff" also), which we could then ("efficiently") parse. Might be worth a shot.

@Anonymous: haha, I got it right the second time. ...

2011-01-17T13:02:15.807-07:00

@Anonymous: haha, I got it right the second time. That's what too much Perl will do to you! (fixed.)

Chompsky?

2011-01-17T13:00:30.536-07:00

Chompsky?

@Suresh: I have no idea what a derivative is -- I&...

2011-01-17T12:53:19.890-07:00

@Suresh: I have no idea what a derivative is -- I'll look into it. Others: see here for what Suresh is talking about ;). The key claim: "This post also describes compaction (not in the draft), and makes a formal argument that the cost of parsing with derivatives is O(n|G|) on average."

What about the data derivative stuff that Matt Mig...

2011-01-17T12:44:52.608-07:00

What about the data derivative stuff that Matt Might's been talking about ?

@Anonymous: thanks, but actually it's 2001, no...

2011-01-17T12:30:56.161-07:00

@Anonymous: thanks, but actually it's 2001, not 2011! :).

nit correction: "Fall 2010" for "Fa...

2011-01-17T12:29:43.746-07:00

nit correction: "Fall 2010" for "Fall 2011"