NPIs in Russian (are another excuse to compare HPSG and GB)

Posted on November 11, 2008 by Joshua.
Categories: Syntax.

Yesterday’s reading for Syntax Reading Group was Asya Pereltsvaig’s Negative Polarity Items in Russian and the ‘Bagel Problem’, and it was interesting for a lot of the same reason’s that I enjoyed last week’s paper.

Basically, the so-called ‘bagel problem’ is this. There are two (types of) negative polarity items in Russian that are in complementary distribution - the “ni“-items and the “libo“-items. Of course, being in complementary distribution means that they can’t appear in the same environment, but looking for the environment that conditions the choice turns out to be unexpectedly problematice. On first glance, the generalization seems to be semantic - to wit, that ni-items are “stronger” than libo-items. The disinction being made here betwen “weak” and “strong” basically comes down to how far one is allowed to take the semantic entailments associated with each item, and the difference between “weak” and “strong” items on these grounds is well-established cross-linguistically. “Strong” here means “antimorphic,” something of a complicated term to explain, but here goes.

Something is antimorphic when it meets all of the following conditions:

  1. f(X or Y) implies f(X) and f(Y)
  2. f(X) and f(Y) implies f(X or Y)
  3. f(X and Y) implies f(X) or f(Y)
  4. f(X) or f(Y) implies f(X and Y)

So, for example, never in English sets up an antimorphic context. Each first sentence in each of the pairs below entails the second.

  1. I never sing or dance IMPLIES I never sing and I never dance
  2. I never sing and I never dance IMPLIES I never sing or dance
  3. I never sing and dance IMPLIES I never sing or I never dance
  4. I never sing or I never dance IMPLIES I never sing and dance

Antimorphic, ladies and gentlemen!

Ok, well, a superficial look at Russian leads one to the conclusion that ni-items show up in antimorphic contexts and libo items show up elsewhere (itallicized because syntacticians like to call such things “elsewhere conditions” - that is, “if a certain conditon doesn’t obtain, use me….” kind of thing).

There’s just one problem. It turns out there is one clearly anti-morphic context where you use libo-items instead, contrary to prediction. This is the context defined by the preposition bez (’without’). It’s easy to see that ‘without’ is antimorphic in English:

  1. He showed up without paper or pencil -> He showed up without paper and without pencil
  2. He showed up without paper and pencil -> He showed up without paper or without pencil
  3. He showed up without paper or without pencil -> He showed up without paper and pencil
  4. He showed up without paper and without pencil -> He showed up without paper or pencil

And so it is in Russian too. All these implicatures hold. Unfortunately for the theory, bez can’t be used with ni-items.

Pereltsvaig’s solution is to notice that the only difference she can see between bez and all the other antimorphic contexts where ni-items appear is that bez is clearly never a complementizer at any level of the derivation. It’s clearly not a lexical complementizer, and neither is there any reason to suspect that it ever plays a semantic role as a complementizer (in Minimalist/GB-speak - sometimes called ‘GiBberish’ - of course you would say ‘it doesn’t raise to C at LF’). So once again - as with last week - you seem to have gotten your syntax in my semantics. The distinction between ni- and -libo is semantic, bucept when ni- fails to be a complementizer, in which case you ignore the otherwise-convincing semantic distinction and use libo instead.

Sounds like a job for HPSG!

Of course, Pereltsvaig’s analysis is all about raising things at LF where we can’t see them, i.e. about as far from HPSG as it’s possible to be. But it’s an interesting question all the same. How would an HPSG account handle this?

My first instinct is to say that HPSG can’t handle it, that this is an example of where Minimalism/GB is superior, and that’s because the distinction is positional. But of course, as soon as that hits the screen I realize that it’s only because I’m training in GB that I think of “what’s a complementizer?” as confounded with “where does it show up?” The idea that “complementizer” has a special position all its own is a GB prejudice. In HPSG, where sentence position isn’t explicitly encoded into the theory (rather, it “falls out” from the inventory of features - certain combinations turn out to be illicit), “complementizer” would just be a feature (erm, set of features) like everything else, and there wouldn’t be any talk of where it shows up (and certainly no talk of where it raises to!) at all.

So actually, HPSG could handle this just fine. And in fact, HPSG handles it a little better, because you don’t get into all the complicated (and probably unfounded) speculation about when words are “inserted” into the derivation. Perelstvaig’s solution leverages Distributed Morphology - a theory that includes so-called “late insertion,” whereby lexical items aren’t actually inserted for pronounciation until the deriation is complete. (So ni- and -libo items, on the extreme version of this, might just be different pronounciations of the same item - though most distributed morphology people would say they are different items, and that you insert the more completely specified item - ni, since it’s “pickier” about where it goes - if you can, the other one - libo in this case - if you can’t.) In HPSG such an issue doesn’t even arise since all items carry with them their complete feature set, and there’s no derivation to speak of. There is no debate about when items get “inserted” as they were always just there, and the theory merely tells you whether they can appear in this order or not.

One of the orignal selling points of HPSG was that it bundled syntax with semantics (calling the items over which it operates ’signs,’ the main grammatical feature of which is called ’synsem,” a combination of ’syntax’ and ’semantics’) - a move that seems sensible given that syntactic order is sensitive to and clearly affects semantic interpretation. So this would seem to be yet another plus in the HPSG column. But interestingly, reading this paper gave me some appreciation for LF, and why we might want such a thing.

Basically, that’s because semantics is positional in some sense too. Or, at least it’s convenient to think of it that way when we’re being lazy. I guess “positional” is maybe the wrong word - but we humans are in the habit of marking scope from outside to in when we draw up semantic formulae, and I suppose it therefore seems to us that there’s something positional involved. The rub came with some sentences in the footnotes where libo items appeared in what looked like ni context. Pereltsvaig explained them away by noting that the items in question were D-linked (they have a discourse-level rather than a purely sentential interpretation) and thus in some semantic sense outside the scope of negation. And in that sense - that is, to the extent that we like to think of something as being “outside” or “inside” a scope - LF is actually kind of nice, because it represents the semantics of the sentence in the way we’re used to thinking about it. Things that “scope over” others “raise” above them in the covert movement phase of the derivation. So first you do your syntactic movement - that is, get the items arranged in a proper syntactic heirarchy. Then you linearize (erm…sequentialize - flatten the tree) this to pronounce it, and also read the semantics off it by allowing more determined movements of distinguished items out of the syntactic arrangement. This has the advantage of making a fairly clear prediction: the semantics/syntax interference only really runs in one direction. Synatx can get in the way of the success of semantics, but not the other way around. Syntax is “primary,” and you only worry about whether the interpretation makes sense once the syntax has been satisfied.

In HPSG, all this happens at the same time. And unfortunately I know hardly anything at all about how semantics works in HPSG, so I can’t really speak to any effect where syntax might block semantics or vice versa, or which direction (if any) the influence tends to run. But I did want to note that this is another area where the theories are not “mere notational equivalents,” as some have accused Chomsky of once having called them. How the syntax-semantics interface is handled in each theory is clearly different. In the so-called “Standard Theory,” there literally is an interface, such that the semantic component of the grammar takes a syntactic structure and operates on it. In HPSG, there is no such interface. Syntax and Semantics run in parallel, and failure of either to resolve at any time tenders ungrammaticality.

Repent, and ye shall be forgiven

Posted on November 9, 2008 by Joshua.
Categories: Lexicon.

As a kind of followup to yesterday’s post - I notice that today on PrawfsBlog there is a post about whether to use “data” and “media” with plural agreement. Prawfs comes down firmly on the side of singular agreement - which I personally applaud.

‘Media’ and ‘data’ are mass nouns expressing uncountable quantities. When used as such – and they almost always are – they should be paired with the singular form of verbs. That’s my view, at any rate. And I’d say it’s well accepted.

Here’s the hitch. I’d been saying “data is” and “media is” all my life until I got to gradschool, and then I was suddenly surrounded by people who use them prescriptively. It’s gotten to the point that I find myself actually saying “data are” naturally. And by “naturally” I mean not completely naturally because I always do a kind of inner doubletake when I catch myself at it. So it’s definitely something bolted on top of my real grammar, but nevertheless something I say without thinking.

As for “media are,” that’s just British, and I seriously hope that one never slips out (though I do hear it all the time around here).

The interesting question is - how many other people come to academic circles and start saying “data are” just to conform? At what point does it become a critical mass, such that it’s actually native for this register? How many of the people that I’ve picked this up from also started out saying “data is” and then gradually found “data are” constructions slipping out, to the point where now they are, in effect, speakers of a “data are” dialect?

I’m guessing the answer to this last question is “quite a lot.” And I’m guessing that further means that this still counts as a prescriptive change rather than an actual register shift for that reason. And I’m thinking that I’ve just dedicated my life to saying “data is” in polite circles.

As for whether “data are” counts as an example of where prescriptivism helps by making language more precise. I don’t really think so, no. I mean, on the one hand it does, because it tends to emphasize that we’re talking about a collection of data points, each of which is a fact in itself, and so for that reason I suppose one could argue that there is apt to be less confusion about the conclusiveness of data. That is, if you think of data as always and only plural, then you’re less likely to make the mistake of thinking of it as a monolithic thing and more likely to keep in mind that conclusions are abstractions over a mass of evidence. Fine. But on the other hand, any researcher worth his salt should be able to keep this in mind without having to tinker with the language to manipulate himself into it. Fixing this perception is the responsibility of introductory methods classes, not standard English useage. Now, one could argue that it’s nevertheless important that the language lay bare its semantics. And that’s true, up to a point. But there’s also something to be said for grammatical consistency. The truth is that in American English we are largely in the habit of treating collective entities in exactly the way PrawfBlog suggests we treat “data.” We say “Congress is unpopular” without any confusion on the point that some individual Senators may well be popular, in contrast to their colleagues - or that opprobrium may be focused on the House more than the Senate. But the decisive thing here for me is the stiltedness of “datum.” It’s all very well to insist that “data” is plural if you’re in the habit of using its singular form. But so very few people are - and certainly I’m not one of them. Like most people, I feel more comfortable talking about a “data point” or a “piece of data” than I do a “datum.” In fact, I’d go so far as to say that I’ve never used the word “datum” in my life. Worse than that, when I consult my language competence I find that I am capable of saying “datums” - as in “the datums in this cluster are closer together than the datums in that one.” No, it doesn’t sound completely natural and no, I would never, ever actually use that with a straight face. But the point is that it doesn’t sound any more ridiculous to me than just saying “well, one datum that I saw said…” Meaning that “datum” isn’t really a word that I’ve internalized. Meaning, in turn, that I don’t have a singular form of the supposedly-plural “data.” Meaning that when I want to talk about a singular “data” I do it in the way that I normally singularize a mass/collective - by using a countable construction containing that “plural.” Conclusion: “data” is not a standard plural for me, and it is only because I’ve been hanging around pretentious academics that I’ve learned to use it that way.

I REPENT!

What’s the Non-Loaded Version of “Crotchety?”

Posted on November 8, 2008 by Joshua.
Categories: Lexicon.

Every profession has its bugbears - those bits of “common sense” that fall into its domain that the public earnestly believes in but which are totally incoherent when examined. For Economists, it’s the make-work fallacy, for Astronomers, it’s the idea that proximity to the sun causes the seasons, for Statisticians it’s likely to be cum hoc ergo propter hoc. For Linguists, of course, it’s prescriptivism.

But click on the link and you will read a good argument that sometimes people take the hunt for Prescriptivists too far.

There are some things that look superficially like prescriptivism but aren’t. One of these is lamenting the loss of a useful distinction. For example, a pet peeve of mine is the incorrect use of abbreviations in footnotes in scholarly writing. All too often nowadays I see v., cf., and viz used as if they all meant “see”… Now, why is my dislike for the conflation of these three abbreviations not prescriptivism? It is because what I decry is not deviation from a standard merely because it is deviation but because it results in the loss of a useful distinction. When I encounter cf. in a recent paper, I can no longer assume that the author is pointing me at a view differing from his own or a study using another methodology. If that is what I am looking for, I may waste a trip to the library. Furthermore, the loss of this distinction is not really a natural linguistic change. After all, the whole system of scholarly apparatus is specialized and artificial. The reason that this distinction is being lost is that those responsible for training scholars have largely ceased to teach it. Students are expected to pick it up, and all too often they fail to pick up on some of the details.

I’m not sure I agree that this isn’t prescriptivism. Specifically, I don’t follow the argument that the fact that a particular linguistic use is “specialized and artificial” absolves it from all such associations. I would have preferred a wording where we acknowledged that this was “prescriptivism” in the broad sense of the term (making normative statements about language use based on a listener’s ideal rather than patterns of popular use), but that it was a permissible example because this is a context where we are outside the normal domain of popular use. Yes, in some sense there is such a thing as a “popular use” of an academic formalism among academics, but using it as a standard is nevertheless linguistically inappropriate because academic discourse was designed with precision in mind and is not meant to be ordinary linguistic communication. By refusing to label anything as “prescriptivist” that is not intended negatively, Poser is, in fact, skating dangerously close to being found guilty of his own accusation: he’s blurring meaning distinctions.

Prescriptivism for me is any time when someone elevates wishful thinking about how language “should” function over evidence of how it actually does function. In most cases, this will be a bad thing for all the familiar reasons. But there are some times when it is not, and Poser’s example of chiding people about “incorrect” use of cf. qualifies as one of them.

An even more intersting example that has been in the news recently: Joe Klein’s quizzical assertion that Palestinians cannot be antisemites:

Here we have the McCain campaign’s execrable Michael Goldfarb slinging around accusations of anti-semitism–a favorite pastime, as we’ve seen this year, among Jewish neoconservatives. I’ve never met Rashid Khalidi, but he is (a) Palestinian and therefore (b) a semite, so the charge of anti-semitism is fatuous. (empahsis mine)

Here’s an example of prescriptivism gone mad, obviously - and yet I think there’s an important point to be made about what Klein is saying.

Clearly, on the face of it, Klein is being ridiculous. In popular parlance, “antisemite” means “someone with an irrational prejudice against Jews.” Being a Palestinian actually makes Khalidi more likely than average to suffer from this malady from a purely statistical point of view. So it’s unfair stereotyping, perhaps, but it’s not “fatuous.” Klein’s assertion here is particularly ridiculous becuse the reader can be reasonably sure he doesn’t believe it himself. People are not commonly in the habit of analyzing the constituent parts of words and using the inferred meaning in all contradiction to the way people around them use them. Such people exist, of course, (Bill Buckley springs to mind), but they are generally ridiculed as pretentious. No, Klein knows he’s making a mistake here - he’s just angry enough at the time of writing not to care, I assume.

Notwithstanding, I think he (unwittingly) raises a legitimate complaint. Namely - there IS some sense in which the word “antisemite” should mean [opposed to] + [semites]. Here’s why I think so.

Language IS a compositional beast. If I give you a new word - say wug - and tell you it’s a verb and ask you to use it in the past tense, you are likely to come up with wugged, and I am likely to agree with you. There is, of course, some debate about whether that’s really a rule application or just by analogy with “hug” and “tug,” but the debate becomes less accuse the longer the word in question. Longer words are generally highly infrequent, and so it begins to stretch credulity that anything other than rule application could be involved. Wugged may well come from hugged and tugged, but wugulforentised? Hardly.

Antisemite itself is a pretty obvious construction from anti and semite - and it helps that anti- is so superproductive in English nowadays that you can apply it iteratively almost without bound (the infamous “anti-anti-anti-missile defense system system system”). This is a word that simply MUST be the result of composition. And so it apparently is. According to the article, it has its origins in German racialist writings of the mid-to-late 19th century. That it rapidly came to be directed exclusively against Jews is simply an artefact of the fact that Jews were common and occupied positions of power in Europe whereas there were few, if any, Arabs about, and what few there might have been would not have been in a position to be seen as politically threatening. But the productivity of “anti-” and the general familiarity with the broader use of “semite” to include not just Arabs but some other races as well means that the compositional meaning of the word is still available in the system. So there is a real tension there. “Antisemite” does mean “anti-jewish,” but it’s still easy for us to see how it could have been otherwise.

I would insist that the key points here are two. It isn’t merely that “semite” is available in its broader meaning, it’s that “anti-” is as productive as it is. Consider another recent controversy - the use of niggardly - meaning “stingy” - which some misinterpret as a having a racist etymology. In fact, it comes from Norwegian and has nothing to do with black people. But the sound association with “nigger” was too much for some people, and so David Howard (a mayoral aid who used the word in a press conference) had to resign his position. Since the word “nigger” is available as a racial slur and since “niggar” sounds like “nigger,” and since “niggardly” seems a plausible adjectival inflection of “nigger,” etymologically uninformed people easily got the wrong impression. In this case that impression was mistaken, but it is important because it illustrates that meaning-building mechanisms for unfamiliar terms do exist. It is because of this that people like Klein are able to exploit the compositionality of “anti-semite” to suggest that it means something other than it does.

So here’s the punchline. I think critique of “antisemite” as a confusing word is also legitimate prescriptivism. I’m not actually advocating that we change the term, of course. What’s done is done - the term exists in its present form and is clearly understood by everyone. What I am saying is that I have some sympathy with people who get fussy about these things as they’re forming. To cite my own personal pet peeve - it irritates me to no end that people call Democrats “liberals” when that term is at odds with how “liberal” is used in Economics. As if it isn’t inconvenient enough that learned people have to juggle two wildly divergent uses of the term in spheres that have a tendency to overlap (discussions of Economics often turn into discussions of upcoming elections), I think political neophytes are actually misled by them. On being introduced to Democrats and Republicans, they ask what each stands for, and a parent, who doesn’t really know, makes the obvious leap of logic and says “well, Republicans are ‘conservatives,’ which means they want to keep things as they are, and Democrats are ‘liberals,’ which means they want to free things up to change.” And if it even stopped there … but it doesn’t. Bill Buckley, on “founding” modern conservatism with the initial publication of National Review in 1955, subtitled his maiden column “standing athwart history yelling stop.” I think it was perhaps a bit too convenient a counterpoint to the then-fashionable Marxist historicism. The Marxists claimed that History was a science, that events were predestined to flow in their direction, and Buckley’s little quip about “standing athwart history yelling stop” then foreverafter confounded opposition to the pace of cultural change with opposition to socialism. They are not the same thing, and intellectual discourse has suffered for it.

Lest anyone think that these examples are only ever political, let me cite Noah’s favorite pet peeve on this front, which isn’t at all. Noah likes to complain that people mix up “linear” and “sequential.” It doesn’t bother me as much, since I don’t work as much with math as he does, but I can easily see the point. In Syntax, when we speak of “linearization” functions, we should really be calling them “sequentialization” functions - because it isn’t so much the lining up of morphemes that matters as determining their order (indeed, to get nitpicky about it, if you face a linebreak then the final words of your sentence will actually precede the first in left-to-right order). If this seems like splitting hairs - well, it is. But academic discourse, as strongly implied by Poser above, is all about splitting hairs. Academic discourse is artificial precisely so that we can speak with more precision than we do in everyday conversations. “Linear” in its ideal definition says nothing about order and everything about relations. The function that converts inches to centimeters is “linear” because for every one inch you increase length, you have increased it by a predictable 2.2 centimeters. The proportion of inches to centimeters never changes, though the quantity of each certainly does. This meaning of “linear” has to do with lines in that any plot of the function will be one. Of course it’s easy to see where the “sequential” use of “linear” came from: sequences are also easily (and therefore frequently) illustrated with lines. The trouble with this analogy is that it was unnecessary. We already had the word “sequential,” and there was therefore no need to sacrifice precision by expanding the coverage of “linear.” It happened, it’s done, and I don’t think Noah actually advocates for “correcting” people on this front, he just finds it all mildly frustrating.

So I think there is a place for what we might call “etymological prescriptivism.” We make normative statements about how people should use certain words in certain contexts on the belief that discourse would in general be clearer if people adopted our recommendations. If there is no need for two words that mean “sequential” and indeed using both of them interchangeably is likely to lead to confusion, then there is a case for moderating at least one’s own speech to try to eliminate the overlap. And it is on this basis that I make a point of avoiding using “liberal” to describe socialists. Since “socialist” is a loaded term, I am polite enough to say “leftist” instead - but the point is that I think political and economic discourse would gain by finding a more convenient way for people to separate classical liberals from contemporary liberals since the two are not of the same philosophy at all. And finally, yes, I think it would be worthwhile coming up with a term that means what “antisemite” originally meant. It’s not, after all, difficult to imagine people who are opposed to both Jews and Arabs, and for the same reasons. Perhaps it isn’t an anthropologically useful category, but there is certainly a political use for a term that means “people of the Holy Land,” regardless of whether they are Jews or Arabs. Certainly some will object that we shouldn’t be in the business of manufacturing politically uncomfortable categories. But I would respond that this is the same dodge that the politically correct crowd uses. Rather than dispute the ideas, they seek to change the terms, with the result that all the attitudes they oppose merely linger beneath the surface. If you want to fight something, it helps to be able to name it.

As for Linguists - I think there are many ways in which the crusade against “prescriptivism” has gone too far. It’s a bit like opposing “goto” statements in Computer Science. It isn’t that it’s not a good idea in most cases, it’s just that it’s impossible to do completely, and so a more open discussion of the topic wouldn’t hurt. I can remember a sly Dan Friedman in class saying “because a function call without any arguments is a goto” to gasps from the crowd, and it was really gratifying. The point was just that you can’t completely eliminate gotos from your semantics, even if you can stop providing the programmer with easy access to them in the way you style your language. Well, so it is with Linguistics. Saying that prescriptivism is always bad disempowers people from employing it in those rare cases when perhaps they should. Certainly it stops them from recognizing it in action in all its forms.

So let me get prescriptive about prescriptivism. I think we could do with a more honest definition of the term - one which means what it means now, but without the negative spin. “Prescriptivism” should go back to being an academic term rather than a value judgment, and people can state their value judgments independently of the term. And then we can employ “prescriptivism” in instances where it perhaps should be employed - for example, in lamenting the collapse of a distinction between socialism and classical liberalism in modern political discourse - or, indeed, in complaining that academics no longer know what all those Latin initials “really” mean.

In short, time for a breath of fresh crotchediness after all this stifling flower power nonsense.

Not Notational Variants (Exactly)

Posted on November 3, 2008 by Joshua.
Categories: Syntax.

In Syntax Reading Group we’ve been reading Negation in Slavic, a collection of papers on the titular subject. Today’s was The Morphosyntax of Polish Verbal Negation: Towards an HPSG Account by Anna Kupść. It’s an interesting paper because it really hammers home the differences between HPSG and the so-called “Standard Theory.”

There’s an attribution - probably apocryphal - of Chomksy saying that HPSG is nothing but a “notational variant” of “mainstream” syntax. It’s tempting to write this off as either trite or condescending. It’s trite in the sense that any syntatic framework should aim to account for the full range of syntactic phenomena; it’s condescending in that it’s uncharitable to think of theories designed by such intelligent people as Pollard and Sag as motivated by nothing other than notational preferences. In either case, it begs the question why anyone would bother to take the time to make up a theory that adds nothing to the discussion? But stop to pause on that and it immediately occurs to you that in fact no one has ever written a full comparison between HPSG and the so-called “Standard Theory.” The apocryphal Chomsky quote may be on point, for all we know!

So whenever I read HPSG papers I’m constantly on the lookout for things that would clue me in to what the fundamental differences are. Are there grounds for preferring one framework over another, and if so, what are they?

For me the “HPSG question” has always broken down like this. The advantages to HPSG are two: it is not (obviously) directional, and it really only possesses a single mechanism for explanation. The first is nice because humans are consumers as well as producers of utterances. The so-called “Standard Theory” is good at production, but not so much at parsing. Working one way, everything is nicely restricted; working the other way, it’s a hopeless data explosion. The second is nice because it keeps researchers honest. One cannot simply invent mechanisms willy-nilly to account for new findings as is possible in the so-called “Standard Theory.” Everything in HPSG must be explained in terms of feature unification.

The advantages of the so-called “Standard Theory” is only one, but it’s hugely important: GB/Minimalism makes transparent generalizations. With HPSG, everything must be explained in terms of local feature unification, even when it’s not obvious how to do that. In particular, this makes word order and long distance dependence problems a bit problematic. There’s no doubt whatever in my mind that HPSG can capture all of the relevant generalizations, it’s just that several of them require rather elaborate feature specifications of the type that sometimes leave one with the impression that it captured the data but missed the point.

Of course, this subject requires a book-length treatment, hardly the sort of thing that can be handled properly in a blog post. I just wanted to say that the Kupść article linked above is a nice illustration of one area where the two frameworks are not mere “notational variants,” and where I think HPSG is better suited to the data.

The problem with Polish negation is that although everyone agrees that negation is a syntactic phenomenon, it behaves in Polish in some ways as though it were an entirely lexical phenomenon.

For example: some Polish verbs are only ever negative, and others seem to have no negative form at all. For some verbs, the negative particle can be separated from the verb by an auxilliary, for others it may not. This sort of “case-by-case” approach to rules is typical of lexical phenomena. Syntactic principles, by contrast, should be universal.

Without really getting into the details, the analysis in the paper takes the approach that some verbs are prespecified for being negative or not in the lexicon. For these verbs, negation is not really a syntactic but a lexical phenomenon. For all the other verbs, negation is syntactic just as it is in all other languages.

The point is that in HPSG, where all syntax is in the lexicon (in the form of lexical features) anyway, you have a good way to “fudge,” as it were, on whether a phenomenon is lexical or syntactic. It works like this. All words (of any category) come with a head feature NEG. For most words this will be unspecified. However, for some it is specified in the underlying lexical entry. Syntactic rules in HPSG operate by taking two (in the typical case- but sometimes it’s more, sometimes less) items that meet their structural description and unifying with those items. In other words, the rule itself acts like a lexical structure that simply fills in its missing blanks with subordinate structures (which can be words, or other such complex structures, actually). Rules can be made to apply to some items but not others by setting features on a “rule” object in such a way that they won’t unify with certain items. In the crudest case, you could simply make a boolean feature “Applies-to-me?” and set it to + or -, and then set the same feature to + on the rule, thereby excluding any items that were pre-specified as -.

So for Polish negation it’s quite simple, actually. Rules can be set up so that they will only unify with items that are not already specified [NEG +]. Those items that are so specified cannot be the arguments to the rule, and thus will behave differently from items that are unspecified for NEG. In a very elegant way, you resolve your “tension” between the language-independent generalization that negation belongs in the Syntax on the one hand and the hard evidence that some Polish verbs form lexical exceptions to this generalization on the other. Since all Syntax is in the lexicon for HPSG, it is easy to write rules that apply to some words and not others - without losing sight of the larger generalization. This is very cool.

I don’t know what the Standard Theory would even do about cases like this. In that theory, for all its pretensions to being a “lexicalist” approach, Syntax and Lexicon are actually quite separate, and it’s hard to write your Syntax in such a way that it applies to some distinguished items of a class in different ways than it does to other members of the same class.

The point of this is not to advocate for HPSG - though certainly I think this case is a plus in the HPSG column. The point is just to note that the two theories are NOT “notational variants,” that though it perhaps looks like that in a lot of cases, they do represent very different approaches to the study of syntactic phenomena grounded in different priorities, and some syntactic phenomena come more naturally to one theory, others to the other.

Of course, as a Computatianal Linguist, I think all such disputes should be resolved on the basis of which comes with the more tractable implementation algorithms!

The Merit in What I Do

Posted on October 21, 2008 by Joshua.
Categories: The Field.

One of the reasons that I like the “Computational” part better than the “Linguist” part of my job description is that the “computational” part doesn’t give you any bullshit. A program either runs or it doesn’t, it either gets the result it’s supposed to get, or it doesn’t. And while for very complicated programs it’s not always immediately obvious that it’s not doing what it’s supposed to, it eventually not only becomes clear that it isn’t, but it is always possible, with a certain amount of effort, to explain why it’s failed.

In Linguistics, by contrast, it can be frustratingly hard to separate out the assumptions from the conclusions. And indeed, certain Linguists - infamously including Noam Chomsky - actually take advantage of that truth to avoid criticism. The Minimalist Program is a “Program” and not a “theory,” after all, because it wants to adopt certain assumptions without having to justify them.

I’m actually not opposed to this style of research in principle. In fact, I’m not sure how else Syntax is supposed to operate. As van Riemsdijk and Williams put it in the introduction to their excellent syntax primer:

The material in this book constitutes a detailed and specific theory of grammar. As such, it naturally rests on strong assumptions about the domain of phenomena that the theory of grammar is about, and about the role of the theory of grammar in the general theory of language. These assumptions are supported to the extent that the resulting theory of grammar gives satisfying explanations, and to the extent that it supports or “meshes with” theories concerning other aspects of language.

Right. There is no other way to do Syntax - and that’s a fault of the fact that, again using the words of van Riemsdijk and Williams, “It is by no means obvious that the study of grammar is not an arbitrarily defined subdiscipline most properly dissolved in favor of some combination of studies.” Put another way, while it seems obvious to me that there are syntactic phenomena, it is not perfectly obvious, and for that reason people in my line of work often feel the need to apologize for what they do.

They don’t, of course, actually apologize. What they tend to do instead is internalize these feelings behind walls of dogma, perceiving - largely correctly, in my experience - that they are surrounded by people who think what they do is meaningless.

So it’s nice to read in the Briggs Blog today someone who thinks this is a characteristic of any “scientific” field that approaches the humanities. Quoting the man himself:

The closer a field of study is itself to politics or any area which involves human behavior, the more the consensus acts to keep people in line than it does to promote innovation. Non-consensus ideas are not welcome. Professors holding verboten thoughts are not hired, or if they are found out, they are let go, or they even leave voluntarily, tired of the process.

So it’s not just us. It’s Psychology, and Economics, and Sociology, and All that Jazz too. And he gives a possible remedy:

The solution seems to be, because people in areas which involve humans are prone to ill-informed zealousness, that they should all be taught and consistently reminded that they might be wrong. This is the reason, after all, that, on average, people involved in physical areas are humbler: they have seen and verified their failures, and they have seen and acknowledged that their predictions sometimes are a bust.

I would say that’s actually the lesser half of the story. The greater half is that they know their colleagues have experienced similar failures. One of the things that I noticed about Computer Science culture when I started taking classes in that Department is how much failure professors admit compared to students. Which is to say, a lot relative to virtually none at all. And it isn’t too hard to figure out why: professors are tenured and proven, while students are still in competition with each other. So you get these odd situations where the professors come off looking really dumb, admitting to the suboptimal solutions they originally found to the problems they’re writing on the board, or confessing that they can’t read Java code, or whatever - while the students are busy stretching their hands as high in the air as possible to drop comments about having casually done something last night while messing around that’s known to be difficult. In reality, of course, the professors know the subject much, much better - the difference is just that students don’t feel comfortable admitting failure in public yet because they haven’t seen their colleagues do it.

I think the trouble with Linguistics isn’t that we’re not constantly reminded we could be wrong. Au contraire - Linguists are more brutal about this than people in most fields I know. They LOVE pointing out their colleagues’ mistakes. What’s lacking isn’t the Pennance, in other words, it’s the Priest. We’re constantly casting stones and reminding each other just how wrong it’s possible for us to be - the problem is that there isn’t anything forcing anyone to admit that a blow’s been landed. And so we don’t get the critical mass of examples of colleagues publicly admitting failure necessary to create a comfort zone in admiting failure ourselves. It’s an Economics question, really. When a good is scarce, it’s expensive - when it’s ubiquitous, it’s cheap. If you’re in a profession where examples of failure are “a dime a dozen,” to cash in on the pun, then it costs you nothing. But if you’re in a field where people rarely admit it (because they rarely have to), then the cost of a public confession of failure is quite high, and you think twice about it.

So I don’t think the remedy is reminding people that they “could be wrong.” I think the remedy is finding ways to prove people wrong and employing them mercilessly. There’s that oxymoronic military line about how “we had to destroy the village to save it.” In science, I’m not sure it’s an oxymoron. I think a little bloodletting is actually healthy. It’s sort of the way you have to first train a fighter to take punches before you teach him to avoid them. I think the main problem in humanities-adjacent fields like Linguistics is that people don’t take enough punches, and so they’re so scared of them that they curl into little balls in the corner of the ring rather than getting up and having it out. More accurately, what they don’t realize is that it takes more than a single blow to fell a man. Anyone can take a couple of punches - and in fact you don’t generally get in a position to win a fight without getting close enough that many of your opponent’s punches land. Linguists need to get away from the notion that a single counterexample disproves a theory, that any single punch is going to be a knockout blow.

How to accomplish it? My experience is that the laws of Economics may be subtle, but they are laws. So one thing I know isn’t going to work is direct approaches - like reminding people to remind each other to be humble because they may be wrong. The only way to fix it is to change the incentives, to, as it were, lower the price of failure. And the only way I know of to do that is for there to be a lot more failure about for people to see. I can’t solve it - but I think I can make a contribution. A parser-generator for Minimalism along the lines of the LKB for HPSG will at least realize the possibility that there could be a database of sentences that have been used in syntax papers against which people could test their tweaks to the theory - to see just which sentences that were formerly grammatical are no longer predicted to be under the new version of the theory, for example.

Another Reason Why Girls Might Say ‘Holded’

Posted on October 18, 2008 by Joshua.
Categories: Psycholinguistics.

Joshua K. Harshorne and Michael T. Ullman, “Why girls say ‘holded’ more than boys,” Developmental Science 9:1 (2006): 21-32. [PDF]

One of the reasons I feel confident that the pendulum has started its swing back toward symbolic approaches in language research is that the recutionist crowd now regularly engages in all the reckless conclusion by assumption on which they (rightly, in many cases) originally based their criticisms of the symbolic approach. The paper reviewed here is as brazen an example of Asserting the Consequent as one is likely to find in which the authors still bother to collect data.

The overall problem is this: a series of recent studies have shown that in general females outperform males on verbal memory tasks such as recalling words from a list. For this reason, we might expect young girls to overgeneralize less often than boys when producing past tense forms. That is, we might expect that girls would be less likely to produce the ungrammatical holded in place of the grammatical held, and similarly for similar examples. The basis of such a hypothesis is the intuitive belief that regular forms are produced by a rule (e.g. of the form “add -ed to a stem to form the simple past tense), whereas irregular forms must simply be memorized. This is an appealing notion primarily for reasons of memory efficiency: while there is perhaps a performance gain in memorizing frequent regular forms for rapid retrieval (e.g. worked), it seems a waste of brain space to bother explicitly storing multiple forms for infrequent items (e.g. pardon and pardons and pardoned) when the manifest regularity of the lexicon provides such an obvious optimization opportunity.

Nothing, of course, can be asserted without confirmation in science, and the researchers found, in the course of trying to document this assumption, that in fact just the opposite was the case. It seems girls are significantly more likely than boys to produce the overregularized forms, even addressing all the obvious confounds (age of speaker, priming by adult conversation partner, token frequency of use, number of utterances produced, etc.). This obviously poses something of a puzzle. Either the studies showing female superiority in verbal memory are flawed, or the relative inferiority at the task in question is a clue to the mechanism behind female superiority in verbal memory tasks generally.

Taking the second route, the authors hypothesize that if the girls have greater associative memory skills - at least for linguistic forms - they may in fact be producing generalizations on that basis which are extended to forms which should not be generalized. That is, forms like folded get the in way for girls, who have generally greater facility in retrieving them, when trying to produce held than they do for boys.

This yields a testable hypothesis. If such interference is in fact occuring, then it should be predicted by neighborhood effects: items that “sound like” many other regular forms should be more likely to interfere than those that “sound like” comparatively fewer regular forms. Defining “sounds like” gets into controversial territory, of course, so the authors ran tests on three separate interpretations:

Rhyme:
the forms in question rhyme (“sinked” - linked, blinked)
Final Coda:
the forms in question end in the same string (“sinked” - linked, blinked, flunked)
Final Consonant:
the forms in question end in the same phoneme (“sinked” - linked, blinked, flunked, barked)

And since irregulars tend to be monosyllabic, these numbers were calculated twice - once for all relevant forms, once only for monosyllabic forms. In total, six conditions, then.

The correlations were significant and positive for girls in all 6 conditions save one (rhyme measure over monosyllabic verbs - for which it was positive but not significant) - which is to say, girls are more likely to overregularize those irregular verbs for which there are lots of highly frequent similar-sounding regular examples. Boys showed no such correlations at all, let alone significant ones.

There is no reason to raise questions about the data. All of it comes from the widely-available CHILDES Corpus and is therefore easily replicable by anyone who has or is willing to design appropriate software. What is interesting here is not so much what was found but how it is interpreted.

Put crudely, what the authors claim on the basis of these correlations is that girls are generally better than boys at verbal memory tasks, even when they’re not, and that indeed when they’re not it’s because they are. Their greater facility with retrieving stored memory items by association means that girls are more easily confused by regularities in similar-sounding forms (having stored folded and molded, she reasons, on the basis of sound, that there is likely to be a holded, and this short-circuits the retrieval of the dissimilar held). Boys, who either are not as gifted at these sound-based associations or else are simply not as efficient at retrieving their exemplars, are less likely to be misled by their ability and thus are doomed by their verbal inferiority to retrieve the correct answer more reliably. Something is obviously in need of more explanation.

While there is no basis for doubting the data, there is perhaps reason to assume that this interpretation is a bit selective. To get the tiresome and obvious out of the way - yes, there is some evidence for politically correct bias. From page 30:

However, we are not claiming that females depend only on lexical memory for processing complex forms. Even with their excellent memory abilities, females are expected to compose many types of complex forms, including new and lower frequency regulars, and highly complex linguistic representations, including most phrases and sentences (citations).

The word “excellent” stands out here. “Excellent” by what standard? Surely it is the case, as with all such distributions over populations, that there is great variation within groups on level of ability. An individual can have “excellent” memory ability by standing out among his peers; “girls” as a group simply have a somewhat greater tendency averaged over the group to excel in this area than do boys.

But the concern here is not so much with possible politically correct biases as with research biases. Notice the potential explanations we’re ruling out without properly addressing. Foremost is the possibility that it isn’t so much that girls are better at verbal memory as that boys are better at rule-based learning. If indeed rules are in part a method of data compression - the ability to leverage regularities in the lexicon in the reduction of processing load - then the fact that boys store fewer exemplars is a feature and not a bug. “Less efficient at retrieving” (spurious) exemplars may simply be a theoretically-biased way of saying “have internal search engines with greater precision.” Boys’ brains are better-attuned to capturing the real regularities in the lexicon. That’s an important point, since any measure of competence surely concerns itself as much with the appropriateness of the method employed to the task at hand as with realtime performance at that task (though of course the two are related). It’s a bit like calling a computer that evaluates loans on the basis of the applicant’s credit score “equally competent” as a loan adjustor. Given its performance on a certain set of data, it may appear that it is (and given a lazy loan adjustor, it may even be so in a particular case). The reality remains that the computer is using proxy data to approximate the real task. As with credit scores, of course, sound associations are a very good approximation of the actual regularity because they are themselves symptoms of that regularity. The “Final Consonant” condition in that task above, for example, would be explained by a traditional grammarian with reference to productive phonological rules. You know - the past tense morpheme is voiced when it attaches to a stem that ends in a voiced segment, voiceless otherwise, and there’s epenthesis in some cases (or however the official version goes - I’m not a phonologist). Whether or not one believes in such a rule is largely a matter of academic preference, the verdict on which is ultimately dependent on some as-yet-to-be-completed reserach and philosophizing. What is not in question by either side of that debate is that such a rule, if a psychological reality operational in a language, would produce a dataset ripe for exploitation by the kind of “Final Consonant” sound-based associative method described above. One could ape the regularity before he had learned the actual rule in exactly the way that this study suggests that girls have a tendency to do. That the study chooses to phrase its conclusions in terms of a superior verbal ability on the part of girls, rather than a deficit at rule-based learning that is being compensated for by a proxy crutch, owes to a reserach bias that favors reductive explanations. (To see that it is a bias, notice that in the passage quoted above the fact that girls can employ linguistic rules is treated as evidence that they always do. Whereas we are asked to assume that there are differentiated relative levels of ability at associative memory, no such assumption seems to be in play for these authors about rule application.)

Reductionism is a kind of unavoidable disease of science. It is the result of twin concerns, each legitimate in its own right: (1) the need to avoid circularity and (2) the need for a transparent mechanism to underlie our explanations. The first needs no justification: explaining something by naming it is no explanation at all. The second is of course related to the first: we don’t feel that we’ve really understood something unless we can replicate it. Reductionist explanations are often appealing here as a way of capturing noise along with the regularities. But the Turing Test for “thinking” is inappropriate for exactly this reason: testing whether something is human is not the same as testing whether it is conscious; humans have some characteristics that are probably incidental to consciousness per se. If we convince someone that something is human by concentrating on where to build in the pauses/hesitations in its speech, we’ve passed the test without really answering the question. Of course it may turn out that the pauses/hesitations are inevitable consequences of the mechanisms that underly consciousness (in which case modeling them is undeniably useful), but there is no a priori reason to believe so other than assumption. Because of the kind of hair-splitting nonsense that philosophical discussions often produce, I think we’re right to give some weight to operational definitions in science. What concerns me is that they not take the place of real explanation when such is possible. Laboratory word-association tasks are, after all, not real-world linguistic tasks so much as tools for approaching answers to questions about how such real-world tasks are done. It is an error to confound ability on verbal word association tasks in the laboratory with real verbal ability. The one is merely a proxy for the other. The only thing the alternate explanation offered for the phenomenon under discussion - that it is a comparative deficiency in the rule-application abilities of girls rather than in the sound-associative memory of boys that carries the weight of the explanation - has working against it that I can see is that it would tend to associate the bearer with a currently-unfashionable belief in a mental “rule application” mechanism of symbolic flavor. What it has going for it, of course, is that you don’t run into the absurdity of claiming that evidence of one’s lack of linguistic ability shows just how good at language she really is…

To be fair, the authors do note some of these problems as outstanding issues that will need to be addressed by future research. Noting this has, however, not prevented them for titling their paper “Why girls say ‘holded’ more than boys,” as if this were a question they’d answered decisively. More to the point, nor has it prevented them from talking as though they had done so throughout their paper, tempered only by a single caveat near the end. It’s a classic research error. If P then Q, observe Q, conclude, on that basis, P. It’s a named fallacy, fellas.

Talkin’ ’bout my G-g-g-generator

Posted on June 27, 2008 by Joshua.
Categories: General Linguistics, Syntax.

A useful thing for some (other) linguist to do would, I think, be to set up a website for cataloging bad arguments in favor of UG. I’ve just run across a beauty.

I’m writing qualifying papers this summer, some of which are about Syntax, and I thought it might be a good idea to start with some foundational stuff. I never really had a proper foundational course in syntax - for *ahem* various reasons - and most of what I know has been picked up from reading LI and sitting in on discussion groups. There’s a lot of arcana in the field these days, so it never hurts to pick up a textbook and start over … was my reasoning.

So I’ve been flipping through Andrew Carnie’s book, and last night I read the inevitable introductory pro-UG argument. These things are apparently required by Holy Writ of Trade Guild for books on mainstream Syntax. Well, not really - the general argument is that the “standard” approaches to Syntax don’t make sense without UG, which is probably true in some broad sense, but not really in the narrow sense they generally mean. Notwithstanding, every textbook I know starts out with some throwaway rationalist argument that just doesn’t really work. So here’s Carnie’s in a nutshell.

Rather than imagining the trouble inherent in learning a language, which is apparently considerable, we’ll imagine instead someone simply matching sentences with situations. Say, the sentence is the cat spots the kissing fishes, and the child has to match this with a situation (Carnie helpfully provides an illustration of a cat spotting some kissing fishes[sic]).

Her job, then, is to correctly match up the sentences with the situation. More crucially she has to make sure she does not match it up with all the other possible alternatives, such as the other things going on around her (like her older brother kicking the furniture, or her mother making breakfast, etc.).

No objections there. I believe the most quoted statement of this problem comes from Quine, who tells a story of some natives shooting a rabbit and saying “gavagai” and leaving the accompanying white dude puzzled as to whether “gavagai” was the rabbit, or the arrow, or the act of shooting, or some kind of cheer, or … WHAT EXACTLY GORAMIT??? Point being, we’re glossing over some difficult issues blithely saying that kids “hear words in context and pick up their meanings.”

Of course, it’s interesting that Carnie should choose this problem to illustrate in a book about syntax. Surely this is a general learnability issue? I mean, this applies as much to learning words in isolation as it does to learning how to string them together, no? So in that sense it’s kind of an odd retreat to beat.

It gets better.

Let’s make this even more abstract to get at the mathematics of the situation. Assign each sentence some number. This number will represent the input to the rule. Similarly assign each situation a number. The function (or rule) modeling language acquisition maps from the set of sentence numbers to the set of situation numbers. Now let’s assume that the child has the following set of inputs and correctly matched situations (perhaps explicitly pointed out to her by her parents). The x value represents the sentences she hears, the y the number correctly associated with her situation.

And then he gives a table, but let’s make it easier on me typing and just say that 1 gets mapped to 1, 2 to 2, 3 to 3, 4 to 4 and 5 to 5. So the question is, given 6, what do we map it to? Well, you might be tempted to say “6,” but then, foolish mortal, you would have fallen victim to Carnie’s Clever Trap®! In fact, suppose the mapping function isn’t identity, but rather [(x-5)*(x-4)*(x-3)*(x-2)*(x-1)] + x = y. GOTCHA! In this case, x=6 maps to 126. Oops!

And actually, I don’t mean to be facetious. This is quite a good example. The trouble is that it’s nothing specific to syntax or even to language. Yes, it does indeed demonstrate rather nicely a general learnability problem, but how does this imply the existence of UG?

I’m actually quite sympathetic to the idea that at least the foundations of human knowledge are innate, having been pretty soundly convinced of that by Immanuel Kant’s The Critique of Pure Reason when I was 19. Kant’s examples are better, and they crucially deal with general cases, nothing specifically to do with language. The point is that a lot of what we “know” about the outside world is innate, including, for Kant, even the notions that we exist in space and time (yes, I buy his argument there too - but that’s a subject for another post - and probably on a different blog).

This mathematical example is, again, kind of a strange choice for an argument about Universal Grammar, considering it has implications for epistemology in general. In fact, this is a nicer illustration of what Russell called the “Problem of Induction,” in my opinion, than Russell gave himself. (Or, actually, maybe Russell did give this example somewhere else, but I’m more familiar with the famous chicken example from Chapter VI of “Problems of Philosophy”).

Here’s Russell:

And this kind of association is not confined to men … Domestic animals expect food when they see the person who feeds them. We know that all these rather crude expectations of uniformity are liable to be misleading. The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken.

The chicken’s problem is of course the same as that of the child in Carnie’s example. The child has only ever seen a pairing of a number with itself and thus expects this pattern to continue - but she has the wrong idea about what the pattern actually is. Just as the chicken expects the pattern of its being fed when it sees the farmer to continue - and continue the underlying pattern does, though the chicken was mistaken about the nature of that pattern.

But Russell continues:

But in spite of the misleadingness of such expectations, they nevertheless exist. The mere fact that something has happened a certain number of times causes animals and men to expect that it will happen again. Thus our instincts certainly cause us to believe the sun will rise to-morrow, but we may be in no better a position than the chicken which unexpectedly has its neck wrung. … The problem we have to discuss is whether there is any reason for believing in what is called ‘the uniformity of nature’. The belief in the uniformity of nature is the belief that everything that has happened or will happen is an instance of some general law to which there are no exceptions. The crude expectations which we have been considering are all subject to exceptions, and therefore liable to disappoint those who entertain them. But science habitually assumes, at least as a working hypothesis, that general rules which have exceptions can be replaced by general rules which have no exceptions. ‘Unsupported bodies in air fall’ is a general rule to which balloons and aeroplanes are exceptions. But the laws of motion and the law of gravitation, which account for the fact that most bodies fall, also account for the fact that balloons and aeroplanes can rise; thus the laws of motion and the law of gravitation are not subject to these exceptions.

And I think if Carnie had continued along something like these lines, we would be in a better position. Because this is indeed what we syntacticians are doing. Languages are remarkably the same - though they may not appear so on a superficial glance - the world over in terms of syntactic phenomena. It is indeed striking enough that we would like to, if possible, capture these similarities in terms of universal laws that admit of no exceptions, that indeed explain the superficial exceptions. Where I can’t follow this argument is to the point that these laws arise from some biological specialization for language.

Surely Carnie is falling victim to his own trap here. Granted, a biological specification for language is the superficially most plausible explanation. Laws of gravity are laws of objects, laws of language are things specific to a human-produced communication system, so it seems reasonable to look for their cause in biological specialization. But there is no reason we should necessarily look there. There are plenty of other plausible explanations - most notably that UG phenomena could be explained out of simple biological economy using something roughly akin to our explanation for the fact that highly frequent items are more likely to be irregular (because it’s too much trouble for people to remember exceptional forms for words they seldom use, so they devise rules for the past tense of things like “disarm” but are happy to use “went” as the past tense of a frequent item like “go”). In Carnie’s terms, there is an underlying pattern, but we’ve no way of knowing what it is, exactly.

To the extent that Carnie is merely offering some background about learnability for the purpose of advancing UG as a plausible working hypothesis, he’s on solid ground and I support him. This would be something similar to Pinker’s defense of the idea that research should be done into innate cognitive gender differences, even as Pinker himself remains uncommitted as to whether there are such things. Unfortunately, that this is not all Carnie’s doing is made clear by the transition to the next section:

The evidence for UG doesn’t rely on the logical problem alone, however.

Just like that - as though he’d even bothered to present any logical problems that related to language learnability exclusively, i.e. that were not general epistemological problems of knowledge of the outside world in general.

These sorts of things are risible to me since I don’t really see the need for a biological UG to justify the study of syntax in the first place. The fact is, there are syntactic regularities, and this can easily be demonstrated by appeals to the students’ own native intuitions about the classroom language. Where those regularities come from, ultimately, is an interesting question, but it is not a question for syntacticians. Psycholinguists (and neurolinguists, for that matter, to the extent that there really is a language-specific UG) are much more qualified to address those issues than we are. Our job is merely to model how the system works, to describe those regularities that require careful attention to uncover. That these are numerous and subtle enough to justify a field of inquiry has been amply demonstrated over the last 40-50 years to anyone who cares to glance through LI. The “learnability problem” really isn’t, or shouldn’t be, our main preoccupation.

In particular, given the multiplicity of possible explanations for how children acquire the subconscious knowledge of their language they acquire, both proposed and yet-to-be-proposed, and given the highly specialized psychological or information-theoretic or biological knowledge that will be necessary to adjudicate between them, it seems silly to ask people armed only with Chomsky to pronounce an opinion on the subject one way or the other. Certainly biological UG is a plausible working hypothesis, but it is only one of many, and we just don’t have the information before us to venture much more than a guess as to the nature of UG at this point.

Now, I did say that Carnie continued on in another section. So maybe there are better arguments there?

No such luck, actually. Carnie’s next item for consideration is the that-trace effect. Given this pair of sentences:

(a) Who do you think that Ciaran will question first?
(b) Who do you think Ciaran will question first?

a reasonable conclusion for a learner is that complementizer “that” is simply optional. A further addition to the dataset seems to confirm it:

(c) Who do you think will question Seamus first?

So again, we have one of “these pattern things.” The underlying pattern appears to be that you can simply omit “that” if you feel like it. But then along comes something that causes us to question this conclusion:

(d) *Who do you think that will question Seamus first?

Mysteriously, in English, complementizers are prohibited when it’s the subject that’s been extracted. How do children learn this? After all - it’s hugely implausible that they’ve ever heard sentence (d) and been told it’s wrong. Clearly ungrammatical sentences of this kind generally fail to be produced at all. So how do they learn it? Must be UG, right?

Well, again, maybe. But equally plausible seems to me that whatever subconscious model of language they’ve formed from the sentences they’ve heard simply predicts this for them. To cite an example from my own life - when I first learned to play Shogi (Japanese chess), no one told me the rule that you can’t have two pawns on the same file. In Shogi, you see, you can drop pieces that you’ve captured onto the board in place of making a move (so capturing pieces is more like “converting” them to your side). But there’s a ban on using this feature to simply line up masses of pawns on the same file. No one told me that specifically, but when my opponent did it, I turned to the person teaching us the game and asked if it was legal. Surprised, he said that it wasn’t, and then asked if I’d ever played Shogi before, since he could think of no other explanation for how I “knew” that. But I hadn’t played before - it just “felt wrong” is all. Point being, I think it’s really hasty to rush to conclusions about language-specific innate brain modules just because children are able to generalize from the pieces of the system they’ve acquired so far to pieces no one has explained to them. Clearly, there is some kind of innate reasoning ability over systems and rules, but it doesn’t have to be specific to language, and it certainly doesn’t have to be as specific as a parameter setting for the that-trace effect. Again, I would stress that the job of the syntactician possibly includes raising these questions, but definitely not answering them. We simply describe the system in as much detail as we possibly can - and maybe (hopefully!) something about the regularities we uncover will give psychologists and biologists a clue as to how children do whatever it is they do.

I don’t want to be too critical of Carnie’s book in general, I should add. There are some sections of the first chapter that I really appreciate - particularly the boxes on pp. 10 and 12 responding to common criticisms about the existence of rules and the validity of basing a science on “intuitions.” From the former:

… a brain is a mass of neurons firing, how can formal mathematical rules exist up there? Remember, however, that we are attempting to model Language, we aren’t trying to describe language exactly. … Obviously the rules don’t exist, per se, in our brains, but they do represent the external behavior of the mind.

Quite correct. There is a certain subspecies of phonetician/cognitive scientist (the kind that likes to refer to itself as a “language scientist”) that seems congenitally unable to grasp this point. Syntax is a model, not the final physical explanation. Grammaticality exists as a phenomenon in the world, and we try to explain its operation in as concise a way as possible. Entirely too many first-year graduate students in Linguistics come away with the idea that Syntacticians really honestly believe that there are explicit trees in our heads and that “movement” is a genetically-specified neurological operation. Poppycock, obviously, but because it is obviously poppycock and because they somehow form the impression in spite of us that we literally believe this, it’s easy to understand why we’re so often the objects of their ridicule. I appreciate that Carnie’s book takes the time to refute this view clearly. It shouldn’t have to, of course (responding to straw mans isn’t really in an intro textbook’s job description, after all), but it does everyone a service by recognizing the need and addressing the issue anyway.

I just wish we could drop all the talk about UG. Yes, it’s there in some sense, but it’s like Global Warming, really. We know it’s happening, but we don’t know to what extent, what the implications are, or even exactly what the mechanism is (there’s still debate as to the extent of human complicity), let alone what to do about it. Yes, there’s some sense in which there are universal grammar rules for all human languages. But we don’t know exactly where this comes from, on what level it operates, or even how pervasive it is in the real explanations for these similarities. So let’s stow it, please, until people qualified to address these questions can do so.

Unintentional Self-Parody

Posted on June 7, 2008 by Joshua.
Categories: misc.

An actual quote from a blog about language:

The fact that “one of the only” is a common phrase, found everywhere, does not make it acceptable English.

So what, one wonders, would make anything “acceptable English” for these people? Is there any mechanism for making something “acceptable English” OTHER than the fact that it “is a common phrase, found everywhere?”

Alright, granted, if you split the bits apart, then it doesn’t make semantic sense for “one of” to be used with “only,” since “only” is only semantically compatible with single units, and “one of” implies plurality. I get it. But neither does “kick the bucket” have anything obvious to do with death, and yet I’m in the habit of parsing it that way rather than in its potential literal meaning. I could rattle off hundreds of similar examples, but why bother? Anyone with even a passing knowledge of Linguistics is aware of chunking, aware of idioms, and bloody well aware that native speaker intuitions are the measure of grammaticality in any language, not dusty semantics books about “the way things ought to be.”

Now, I’m not as dedicated an anti-prescriptivist as most linguists. I can see the case for lamenting a particular form if it destroys or blurs distinctions previously present in the language. One of my personal pet peeves on this score is the modern habit of using “utilize” as though it meant “use + I’m intelligent,” which it doesn’t, or didn’t used to. And I can see the case for lamenting use of a particular form if it is in some way deceptive, designed to cause false associations. Like calling socialists “liberals” when they stand for the opposite of relaxing controls on an economy. But in each of these cases the reason I am sympathetic to presciptivists is because there is something to be gained in the space of semantic coverage by listening to them.

I simply can’t see what is to be gained by picking nits about “one of the only.” It’s a chunk. An idiom. We know what it means without having to open the hood and tinker with the bits. There is no semantic distinction that’s being blurred, and the standard-use meaning of “only” is not under any threat. Indeed, it is ironically partily the contrast with “one of the only” that sees to this. If I say “I’m the only one who passed,” then it’s clear to you to that no one else but me passed the test in some part because I didn’t say (but potentially could have said) “I’m one of the only ones who passed.”

One could argue, if one were excessively silly, that “one of the only ones” imposes a cognitive burden because of a garden path effect whereby the listener must go back and reparse. (And I’m equally sure there’s a researcher with access to an fMRI gadget and too much government money willing to take colored pictures in red and blue of your brain and pronounce you CORRECT.) But this burden can’t be terribly large. Indeed, I’ve never been in the presence of any ambiguity caused by “one of the only.” It’s more the kind of thing that Gallagher would pick at for a cheap laugh. (You know, Mr. “we park on a driveway and drive on a parkway! It’s messed up!” Yuk yuk.)

So this is unintentional self-parody of the first order. Awesome.

Now That’s Good Advice!

Posted on by Joshua.
Categories: Uncategorized.

Admittedly not from a linguistic source, but cool nonetheless:

The plural of anecdote is not data.

Now if only we could get Sociolinguists in the habit of repeating that to themselves with their morning coffee every day…

What’s an Optimal Dictionary?

Posted on June 3, 2008 by Joshua.
Categories: Uncategorized.

Mark Changizi is something of a sensation after his recent SciAm appearances, so I picked up one of his papers - the one about economical hierarchy organization in dictionaries.

This is a really interesting bit of work - one of those “why didn’t anyone consider this before?” kinds of things. Of course, people have thought of it before, in terms of efficiencies in semantic hierarchies, but Changizi is the first I’m aware of to consider a dictionary as a language optimization problem.

I remember when I first fell in love with Computational Linguistics it was because of research exactly like this. I actually came to IU to study Cognitive Science, something like Psycholinguistics, actually. I sat in on a CL class when I really should’ve been taking Andy Clark’s Philosophical Foundations seminar (the two classes met at the same time) - his last at IU, as it turns out. But I’m glad I made the rash decision I did - because I got exposed to Zipf’s Law and Shannon’s Information Theories, and WOW! Something about the meta-scientific nature of it all really caught me. It was like applied philosophy. We weren’t exactly doing “down and dirty” empirical research, but neither were we playing with building blocks spawned by Chomsky’s imagination. The idea that there were mathematical limits on language production seems obvious in retrospect, but at the time it was a kind of revelation.

Changizi takes me back to some of that.

The question is this: what would an “optimal dictionary” look like? We can speculate that it would have two characteristics.

First, it would involve an ideal tradeoff between expressive power and compression. That is, it would need to be as compact (in terms of the number of items needed to define all its entries - think of it as “total size”) as possible without giving up on complete coverage of the data (i.e. all the words attested in the language). This desideratum has to mostly be studied in terms of hierarchical levels - for two reasons. The first of these is a priori - it’s just in the nature of a “good dictionary definition” that it defines its entry in terms of less-specific, more abstract terms. The second of these is realistic: since there’s no point in setting an upper bound on how many concepts an ideal language needs (it needs as many as people find useful, obviously), and no way to measure how well the concepts in use in a language are covered by the words (no one, that I know of, has a model for “conceptual redundancy” between items that can be tested), the best we can do is hypothesize that the optimal vocabulary will arrange its hierarchy of words in such a way that dictionary coverage is “compact” in the sense of “shortest token-length definitions without sacrificing coverage” mentioned earlier.

Second, it would involve a “strict definitional hierarchy.” That is, there would be a set of “atomic” words which are not defined in terms of any other words (something like the “semantic primes” of Natural Semantic Metalanguage theory), and each level in the hierarchy would draw only from words in the immediately preceding level.

With these assumptions, Changizi is able to lay out some empirically testable conditions for “economy” in dictionaries like the OED. First, it should have an “optimum” number of levels in its hierarchy.

To aid in understanding this concept, consider a binary alphabet - 0 and 1. Obviously there are four possible two-letter words we can make with this: 00, 01, 10, and 11. By the same reasoning, there are 16 four-letter words we can make (I won’t spell them all out - exercise for the reader and all that). But what if we have an intermediate level in which all four of the two-letter words are represented by letters? I.e. a = 00, b = 01, c = 10, d = 11. Well, we can still get our target output of 16 possible words with these letters, but the words themselves only require two characters of storage space rather than four (because they are “ab” instead of “0001,” etc.). So to store 16 “concepts” in a “dictionary” that only allows definitions in terms of strings of the original two “semantic primes” (0 and 1), we need 64 (16 strings of four letters each) characters. But to store the same number of concepts with an intermediate layer of letters that act as standins for combinations of the primes, we can store the same 16 “concepts” with only 44 characters (8 characters to spell out the two-letter definitions of each letter on the intermediate level, 16 two-letter definitions for the output level - that’s 8 + 4 (the letter labels) + 32 = 44). So intermediate levels “optimize” the size of a dictionary by reducing storage space.

The relevant correlate in a natural language dictionary is meant to be “hypernyms,” that is, words like “vehicle” that cover for “car” and “buggy” and “rocket ship,” etc.

So, crunch some math and you find that to capture a vocabulary of 150,000 or so words (i.e. the size of an average pocket dictionary), the “optimal” number of levels is 7. Anything between 5 and 10 levels would be within 10% of optimum, actually. But any number of levels more or less than 7 is less efficient at reducing dictionary size while maintaining coverage.

A second prediction involves the “growth factor” by level. Returning to our example, we saved ourselves 20 characters by adding an intermediate level (down to 44 characters from 64). This savings can be captured in terms of a “level-level combinatorial growth exponent.” In the original example - where we had an alphabet of 2 and an output of 16 and only two levels (the alphabet and the output layer), this exponent is obviously 4 - because 2 times 2 times 2 times 2 is 16. Another way of saying it is that to get from the input layer to the ouput layer, we need four-character definitions, or we need the size of the dictionary to grow by a power of 4.

When we put in the middle layer, however, it’s not so dramatic. Now we need a factor of only 2 to define the middle layer (2 times 2 to get us the 4 words of the middle layer), and a factor of 2, in turn, to define the output layer from the middle layer (4 times 4 is sixteen = we need two letters each of a four-letter alphabet to define 16 words).

So by adding the middle layer, we drop our growth exponent from 4 to 2. Changizi hypothesizes that we can then make a second prediction based on this. Remember, our original prediction was that there would be 7 plus-or-minus two (hmmmm… where have I heard that before?) levels in the “optimal” dictionary’s hierarchy. If that’s the case, and if every level contains a uniform definitional length, then we can guess that an “optimal” combinatorial growth exponent should be 1.3. (I’ll leave the number-crunching to the reader - either that or look it up in the paper. Or trust me that it comes out right - I checked it.)

The third prediction is asserted rather than justified: namely, that thing about strict hierarchy I mentioned earlier. Each level should only use words in the level immediately before it in its definitions.

Alright, so these are all very cool concepts and have given me a lot of brain food over the past day. But …

… here comes the goofiness.

Leaving aside the issue of whether Changizi actually finds this kind of structure in the OED or not (he claims to - but I have grave doubts about his methods, and graver doubts about the veracity of his conclusions), a lot of these assumptions simply don’t make sense for natural language. The most obvious being - why should a natural language restrict itself to the strict hierarchy? Continuing with the earlier example - let’s say we have a concept that, rather than being 0001, i.e. three parts ‘0′ and one part ‘1′ (whatever that means in semantics!) - or is “ab” in terms of our intermediate level - is just “001?” Put differently, what if we have a concept that is “a1?” This is something of a problem for the funness of the model, not only because it doesn’t let us crunch our simple exponential growth numbers anymore (for variable-length definitions at each level, we have to do more complicated math), but also because it introduces ambiguities. “001″ can, after all, be represented as “0b” or “a1″ equally well. So there are obvious model-theoretic reasons why we would want to prevent this - but I can’t think of any compelling real-world reasons to make these assumptions. Indeed, I can think of compelling reasons to make the opposite assumptions. If we’re playing with an alphabet of semantic atoms (which Changizi gives good reasons to assume should have 10-60 members for English), it seems like we would want as many combinations of these atoms as the system will allow, for maximal expressive power. Indeed, the whole point of Zipf’s Laws are to demonstrate that variable length in phonemic specification is something of an optimization. Zipf doesn’t predict that we’ll have a set of pronounceable words of all the same length in human languages! Quite the contrary - the fact that some of these words are shorter than others is the result of a tradeoff between effort and specificity. Frequent words should be maximally abstract and maximally short. Infrequent words should be more specific and longer. I see no reason why this shouldn’t be just as true of a semantic-conceptual space as it is of a phonemic-lexical space. We would expect that some concepts in use in natural language should be “more specific” than others, and that these concepts should involve longer definitions than others. Now - in theory Changizi has this in the form of the intermediate levels. But I see no reason why the intermediate levels should be prohibited from deviating from the ideal exponential growth factor of 1.3 at each level. That prediction amounts, really, to predicting systematic gaps in the lexicon at predictable levels of conceptual specificity. I need more convincing before I believe that such a thing is even an “optimization characteristic” of language, let alone that it actually obtains in English!

Now, I suppose the argument here is that dictionaries should organize themselves in this way, not that the conceptual space necessarily should. But I don’t see how we can get away from the idea of a dictionary as a proxy for the conceptual space. Doing so would be like saying that the dictionary sacrifices accuracy for the purpose of making itself optimally small. But there is no reason whatever to believe that such a process would happen in the real world. The purpose of a dictionary is to accurately record all words in use. The economic considerations of paper saving seem trivial in cost compared with the economic fallout from delivering a product that doesn’t live up to its stated purpose. It would be like worrying first about saving on metal and only a distant second on speed in designing a racecar. True, in the general case we have reason to believe that skimping on metal will make the car lightweight and presumably faster, but the ultimate purpose is to build something that goes fast, and if there are cases where spending a bit more on a bit more metal will accomplish that goal, then damnit that’s what you do!

In short, I find this paper a valuable first step, with massive bonus points for “interesting concept,” but I’m still skeptical. It seems to me that this is a much harder problem in reality than the model here can capture. It also seems that a lot of the assumptions need further thinking. We might be crossing domains - imposing constraints from one domain in an improper way on another. It’s going to take more sophisticated math to solve this problem properly, I think, and a more thorough explication of the assumptions before I’m completely convinced they’re on the right track.

However, as a thought experiment this paper is inspiring, and I do believe it lays the groundwork properly. This is an interesting question, and the answer given here is almost certainly on the right track, if maybe not complete.