on Jun 10th, 2009Avoiding the Kool-Aid

For the past day I’ve been having a frustrating email discussion with a colleague who has drunk from too much of the Computational Linguistics kool-aid. And by “the kool-aid” I mean the notion that nothing is science unless you can measure it, and that anything that’s not science is somehow voodoo.

The particular topic has to do with collective noun agreement patterns in British English. As we all know, the agreement patterns of nouns in English (of any variety) can be complex and confusing. See this Language Log classic for a crash course in the issues. We also know that British English is somewhat more complex on this front than American English in that it generally allows plural agreement with singular nouns denoting an entity comprised of individuals in those cases where the speaker is emphasizing the members of the group over the group itself. So, to take an example from the Economist’s Style Guide - you can say things like “the preceding generation are all dead.” What allows it in this case is that the dying was done by all the individuals in the generation individually, and that the generation as a whole can’t be dead until all of its members are. Since the emphasis here is on the members more than the group, plural agreement is allowed. By the same token, the Economist’s style poo-bahs seem to think that “the me generation has run its course” should take singular agreement - since the emphasis is on the group as a political/cultural entity rather than as a collection of individuals. All of which is to say that British English encodes in its syntax a semantic distinction that American English doesn’t necessarily. The cliche is that British English sacrifices syntactic consistency for semantic accuracy or richness of expression, where American English prefers grammatical consistency and leaves it up to context to sort out the emphasis.

Of course, since this is language, the truth is that neither variety is completely consistent here. Americans do sometimes use plural agreement with collective entity nouns - we just don’t do it nearly as often as the British do (indeed, in most cases it sounds weird). And the British, for their part, disallow plural agreement in some cases where consistent application of the “semantic emphasis” rule would prescribe it. For example - you can’t say “England have just voted resoundingly to send Labour packing” (I’m writing this in early summer 2009!) - it’s “England has.” Period.

In fact, the dispute I’m having with my (equally American, though one frequently suspects he wishes he weren’t) collegue is about my suspicion that the plural agreement on collective entities allowance is more permissive in some domains than others. In particular, I have the impression that I hear it a lot more in British sports commentary than I do elsewhere. Granted, I’m not a native British English speaker, but I do watch a lot of British TV and read a lot of British media and so come into contact with their speech patterns a bit more than the average American. And my gut feeling on the matter is that they’re generally almost as consistent about using singular agreement as we are - save for these distinguished domains.

I got some pretty unexpectedly straightforward confirmation of this from Brian Micklethwait’s blog - where in an entry about Cricket he writes the following:

Well, England have survived. Had I put ‘England has survived,’ you’d know that this was about something important, like the recent elections we’ve been having, but ‘have’ means it can only be sport, and indeed it is.

In other words, this native speaker (one presumes, with a name like “Micklethwait”) of British English is reporting that he can sense the topic domain based on whether or not “England” gets singular or plural agreement, and he further seems to expect his readership to share this intuition. Pleased, I sent the quote in an email to said colleague - who replied the snarky response that “data is not the plural of annecdote.”

Well, no, it’s not. Actually, “data” isn’t the plural of anything in American English as far as I can tell. The link goes to a previous post where I admit that in gradschool I’ve picked up the pretentious “the data are consistent with x” a bit. But the decisive point for me is that I don’t have “datum,” which is supposedly the singular form of “data.” I can’t even say that word with a straight face, and I would never actually use it in a sentence (i.e. “this datum says one thing, whereas that one says another.” GOOD GOD ALMIGHTY NO!). No - I talk about “data points” and “pieces of data” when I want “data” to be just one. Which is consistent with other mass nouns - “a line of coke,” for example.

But alright, nitpicking aside, my colleague is, of course, right that no single testimony from any single Englishman constitutes scientific certainty that the collective entity plural is more prevelant in some domains than others. But this is what I mean about how annoying this “if I can’t measure it it isn’t science” obsession among computational linguists is. I wasn’t presenting it as science! It was just a nice bit of corroboratory evidence that I found on the internets, that’s all! Just as no behaviorist ever seriously measured the sweat on his palms or the increase in his heartrate to determine whether he were really in love with his wife, surely we linguists can satisfy our natural curiosity about language outside the laboratory without always and everywhere having to prove the case to scientific certainty? I wouldn’t attmept to publish a peer-reviewed paper on the basis of Mr. Micklethwait’s testimony, no - but it’s enough to answer what personal questions I have about this phenomenon for myself.

When I said as much to my colleague, suggesting that we use the word “evidence” rather than the more domain-specific “data,” he replied that he still disagreed. Which is, frankly, just A LIE. Now, granted, as an aspiring dialectometrist, he has a certain justifiable prejudice here. Since he measures (erm, aspires to measure) dialect differences for a living, naturally he doesn’t want to encourage sloppy intuition-based discussion of dialects. Nothwithstanding - it’s just A LIE to say that that none of his opinions whatever about how people use language are informed by anything outside of scientific data collection! In fact, MOST of his opinions about language use - up to and including the way he himself uses his own language - are formed outside of controlled studies.

Now, of course it’s possible that we were simply talking past each other, and he was thinking of the exchange as an exchange between language professionals rather than the more informal way in which I intended it. All the same, it illustrates something very important that’s wrong with the way the kool-aid drinkers view the world. The fact of the matter is that people consult their linguistic intuitions every day in order to make sense of the input they receive. Even if Mr. Micklethwait is some kind of bizarre aberration who has acquired the “England have = sport” vs. “England has = politics” distinction completely by accident and totally at odds with how most Britons use their language, the fact remains that he has intuitions about this (possibly phantom) distinction and makes interpretive decisions on the basis of them. Which is what we all do every time we hear or read any kind of lingusitic utterance, in fact. Intuitions ARE linguistic data. Not only are they valid linguistic data, they are the PRIMARY linguistic data. Corpus studies are approximations to actual use. They have certain efficiency advantages for certain kinds of studies, yes. They also have the advantage of yielding quantifiable results, which is a sine qua non for certain kinds of applications, yes (especially when comparison between two approaches to an engineering problem is what’s at issue). So corpus studies are indispensible - no arguments from me. But the fact of their being “indispensible” does not allow one to conclude - as so many do - that they are the whole of the law. They are not. At best, they are a convenient tool. But ultimately, corpora are NOT what we Linguists study! We study language use, and language use is an intuition made concrete as an utterance. Intuitions ARE the data. My colleague can well insist that we want more than just Mr. Micklethwait’s word on this, of course. But how many does it really take to draw a conclusion here? Notice that Mr. Micklethwait isn’t just insisting that he has this intuition, he clearly expects his audience to share it. And since he is, as far as I can tell, a successful speaker of British English - that is, he communicates daily with other Britons, more often than not understanding and being understood - his intuitions about what intuitions other speakers have can’t be too far of the mark. Linguistic communication, after all, involves arranging (some might say “merging,” heh heh) and inflecting lexical items in accordance with a shared communication system. We put the words in this order with this inflection and this intonation because we expect that the person we’re talking to, who has internalized the same rules, will decode them in the obvious and expected way. How else do the kool-aid drinkers think this process works?

We’ve been holding a summer version of the Parsing Reading Group that is rapidly devolving into a Machine Translation Reading Group instead. Since all the reading material consists of papers by kool-aid drinkers, we get a first-hand look at a lot of their follies. For example, one member is fond of repeating an account of someone (Franz Och?) claiming at a conference that because his machine translation system outscored some human translators on the BLEU score, that it has “super-human” performance. Obviously, this is laughable (and maybe he meant it as a joke). But while the papers don’t quite take it to that extreme, they still labor under the delusion in a lot of cases that they’re getting closer to human-like performance. They’re not. Even a shallow glance at a statistical machine translation system should be enough to convince someone that whatever it’s doing, it’s NOT mimicking the process by which human blinguals translate between languages. A quick look at the errors should lay any such vanity to rest. Statistical systems make mistakes that human simply don’t. Under ANY circumstances. But it’s more than that. The method by which they arrive at even their correct results can’t be entirely right either. Sure, there’s something right about it, and some aspects of what they do must be correct (for example, taking a word in the source language and looking up a translation for it in a lexicon). But humans, when translating, almost certainly don’t form a list of hundreds of possible translations that their mental corpus of utterances heard over their lifetime would suggest are valid and then, taking context and frequency into account, score them and pick the best fit! Bullshit! What they almost certainly do is exactly what the hated Noam Chomsky would suggest: they have some kind of super-abstract starting point which selects an array of lexical items in the target language on the basis of semantic intention, and they then apply the rules of that language to assemble them into a meaningful utterance. Just so.

So sure, if you’re a dialectometrist and interested in quantifying the differences between dialects, then Mr. Micklethwait’s testimony alone will not be sufficient. But this is because it is incompatible with your quantitative goal. It is NOT because Mr. Micklethwait is by himself an inadequate example of how the dialect is spoken. Quite the contrary - as someone who regularly uses the dialect himself with great success he is a perfectly adequate example of how it is spoken. Generally, of course, we like to check with a few more native speakers to make sure - just because we know that there are sometimes individual idiosyncracies. But the suggestion that I don’t know anything about the dialect until I’ve run a controlled statistical study over a corpus of a large number of its speakers is bogus and deceptive. That corpora are the only available way to study certain kinds of phenomena does not in any way imply that they are the only way to study any or even most kinds of language phenomena. It comes down to this: are you more interested in your self-image as a hardened scientist, or in actually learning something about natural language? Sadly, for many computational linguists, it seems to be the former.

on Feb 14th, 2009Gradience vs. Phonology: Fight!

Yesterday’s Speech Research Laboratory presentation was by MIT’s Adam Albright, under the title Natural classes are not enough: Biased generalization in novel onset clusters. The link goes to a PDF of the paper on which the talk was based.

I’m not a speech researcher; I’m not really even interested in Phonology, let alone things like Phonetics and Auditory Perception. But I went to this one because it bears on a debate that everyone at IU is interested in by default - given the level of sheer idiocy with which the topic is often addressed here: the question of the existential status of symbol processing systems.

Basically, the problem is this. Linguistics has traditionally framed its explanations in terms of symbol processing systems. Which is to say, the fact that The books I shelved without reading were old is grammatical and I shelved the old books without reading them is also grammatical but *I shelved the old books without reading is, perhaps unexpectedly (given the first sentence), not, is usually explained with reference to absolute rules that operate over classes of words, rather than the words themselves. And the fact that while neither blick nor bnick nor bzick are actual words of English, native English speakers seem to feel that the first definitely could be, the second maaaaybe, but no way for the third, tends to be explained by reference to constraints that refer to classes of sounds, rather than to generated soundwaves directly. In the first case, the issue is that “reading” needs an object, and in the first sentence there is reason to believe that “books” is that object and has simply “moved” to a different position in the sentence than the one it “started out in,” where “them” is the object in the second, but in the final sentence the trouble is that there is a ban on “books” having moved from the object position and so “reading” has no object at all. This explanation does not rely on anything like the sound sequence of “books,” the frequency with which it is used by English speakers, at what volume the speaker is speaking, how quickly he speaks, what other sentences he has used leading up to this one, his relative state of health and nutrition when making the utterance. Rather, the hypothesis is that he has knowledge of a highly abstract set of grammar rules that determine which sentences are members of his language and which are not, and that he applies these like a computer to determine what makes sense in his language and what doesn’t. Likewise for the three nonwords. The traditional explanation isn’t too concerned with how many other words in English start with bl bn or bz sound combinations. Rather, it posits that sounds are members of feature-defined abstract classes that either do or don’t interact well with each other.

There is plenty of reason to think that the traditional symbol-based explanations are missing some important pieces of the puzzle. And in fact, as far as I can tell, virtually everyone does believe that the traditional symbol-based approaches are missing important pieces of the puzzle. Including, in fact, the people like Noam Chomsky and Morris Halle who were largely responsible for steering Linguistics toward the symbolic approach in the first place. Of course, a lot hinges on what you mean by “pieces,” and how big “the puzzle” is supposed to be.

Someone could easily write a multi-volume discussion on what the proper object of study of any field - Linguistics included - actually is, of course, which is why it always seems silly to me to get too bogged down in the debate about whether or not it is appropriate for Linguists to be framing their explanations as symbol systems or not. Just as it becomes difficult to talk about a defendant’s guilt or innocence of a particular crime if you have to first say what “the meaning of is is,” it’s hard to get any real work done when you’re constantly fighting with your colleagues over what “Linguistics” means.

So my personal solution has always been for peaceful cohabitation. More often than not, these “disputes” about whether symbols are allowed or not are beside the point, an artefact of the fact that some people study phenomena that operate at a higher level of abstraction than the phenomena that other people study. If you’re only interested in the fact that blick is acceptable while bzick is not, then a symbol-based statement of the generalization that accounts for it will do just fine. If you start to get interested in the fact that bnick, while unacceptable, is still somehow better than bzick, then while you don’t necessarily have to abandon symbols, you might do better filtering symbol sequences with constraints rather than banning them with rules. And if you’re interested in whether there is anything about the vocal tract or the configuration of the brain that accounts for any of this stuff, then the rules and constraints will be a good pointer to the kinds of tests you need to devise, but ultimately your explanation can’t have anything to do with symbols or rules. In other words, symbols and symbol systems are and always have been theoretical abstractions. The same way that Python mediates between the human user and the bit-and-byte operation of the computer, symbol systems allow linguists to talk about the generalizations they’re interested in without having to draw sound waves or pictures of neuron clusters. Phonologists are not interested in how the constraints of Optimality Theory are actually instantiated in the brain as that is a job for Psychologists (and, perhaps eventually, Neuroscientists).

The question of whether there is a grammatical “backbone” instantiated in the brain anywhere at all is more pertinent to Phonology than it is to Syntax. In the case of Syntax, it approaches the undeniable that there is such a system. Put differently, there is a very clean break between the algorithm and how that algorithm is actually “compiled into neurons.” Syntacticians study the algorithm. With Phonology, the field seems to get more and more suspicious that there could be anything like a “sound grammar” that can be studied apart from the soundwaves and articulators. But I think this in itself is revealing. Sound just so happens to be the kind of thing that is pretty amenable to quantitative study. It is comparatively easy to get people to agree to phonetic transcriptions of words in a language, such that reliable corpora can be built. There are spectrograms and such for measuring actual sound waves. And intuitions about whether strings of sounds are acceptable are generally uncomplicated by things like “on the present interpretation” and so on. This isn’t to say that syntax can’t be quantitatively studied, of course, just that it’s comparatively harder to do so - which is, if you think about it, something like equivalent to saying that abstractions are more appropriate to Syntax.

In any case, my impression of a lot of the attacks on formal Phonology of the kind that Bob Port and Adam Leary are fond of leveling is that to the extent that they invite discussion it is because (a) they are at least mildly successful at enticing people to subscribe to a straw man view of what “formal phonology” is, and/or (b) because there are reasonable inroads available into mapping the connections between the abstract level of formal Phonology and the concrete performance acts of speech perception and production, and the availability of these inroads (ironically) make it easy to blur the distinction between the two. The result is that we get drawn into a lot of misleading debate about whether Phonological models are valid, most of which involves simple haggling over the semantics of things like “formal” and “Linguistics” and “symbol” and “discrete.” What we get comparatively little of are demonstrations of how it is that Phonological categories are so persuasive if not real and what it is that accounts for the strange success of abstract Phonology if it is, in fact, an illusion.

Fortunately, there are some researchers out there (and of course Noah Silbert is about to grow up to be one of them) who aren’t so easily sidetracked. Which brings me back to Adam Albright. His talk yesterday was satisfying for precisely the reason that it had illuminating things to say about the mapping between performance and competence in Phonology without concluding that there was no Phonology. I’ll leave the details for the paper (linked above) - but the gist of it is that there are gradient judgements about the phonological acceptability of nonce words that cannot be easily accounted for by the usual appeals to statistical distribution. To the extent that you make an appeal to statistical distribution, it has to be filtered through a feature-based generalization over the data present. Finally, it turns out that there are some preferences that seem to defy the kinds of natural class generalizations that one might get automatically out of a corpus study of a particular language - but of course these are the kinds of things that Optimality Theory is tasked with noticing and writing down - i.e. “phonetic” constraints that are presumably true across all languages.

David Pisoni and Bob Port spent most of the talk smirking at each other like junior high school girls having “eye conversations.” What neither of them did was let any of the rest of us in on the joke by, say, asking Albright’s opinion of the apparently obvious alternative explanation for the data that they were privy to and the rest of us missed. What Pisoni did do at the end was make a funny joke about being on a “mission - like in the Blues Brothers” - but that in itself is a revealing formulation. In other words, Pisoni has made up his mind a priori to any actual evidence that only certain kinds of explanations are acceptable, and he will do the research it takes to convince everyone he’s right. Which is fair enough, really. I think we’re all aware that researchers start with personal biases - prejudices, really, in favor of some interpretation or the other of how the world works. Provided they do honest research to bolster their claims which they submit to due scrutiny by the community, the rest of us can at least try to evaluate the claims objectively, which is the name of the game in science. In fact, one of the best formulations of this idea that I’ve heard comes from an online criticism of Bob Port. Responding to this statement from (Port and Leary 2005):

There is only one route left to justify doing traditional generative phonology or for studying only the abstract sound structures of a language and deny the relevance of articulatory, acoustic and auditory details. It is to claim: We don’t care about linguistic behavior, only about linguistic knowledge. But there is no assurance that a coherent static description of knowledge exists just because that is what one wants to study.

Oostendorp writes the following:

For me the last sentence I quoted is a very important one. Of course there is no guarantee that we will be able to understand X just because we want to. That seems to me inherent in the nature of doing research — or of human existence, and it can hardly be a reason to give up.

Quite right. What is interesting is that Bob Port doesn’t seem to have any problem with the David Pisonis of the world taking exactly this attitude. It is only when people like van Oostendorp do it that it’s “wrong.” It is because Pisoni followed his hunch and did the grunt work that a lot of the “evidence” that makes Bob Port’s papers possible even exists. Perhaps that gives him some license to smirk like a schoolgirl rather than argue at talks that challenge his theories - I don’t know. But surely it’s preferable on both sides to simply amass evidence and discuss it dispassionately.

In any case, there can be no room for doubt that some level of abstraction is appropriate to any scientific inquiry. Science isn’t simply about “measuring things.” I found Albright’s talk satisfying because it presented the reality of phonological generalizations in terms that people like Pisoni and Port should have been able to appreciate, and should not have found objectionable. Indeed, it was a very useful study for precisely the reason that it gives these two camps a kind of level playing field where they can present their findings to each other in terms the other understands, accepts, and can challenge. In other words, it created a kind of framework for peaceful cohabitation.

Unfortunately, at least one of the two camps doesn’t seem interested.

A closing quote from van Oostendorp - just because I like it:

There is no ‘utter lack of evidence’ for the assumptions on which formal analysis of phonologies are based; there is plenty. Maybe we are not going to find it in the phonetics. But then, if we only would take phonological facts as evidence, there would not be a lot of evidence for many phonetic details: it would be a bit funny to conclude from this that these phonological facts put an unbearable empirical burden on phonetics.

Right.

In any case, Albright’s paper is to be recommended because it understands that one takes Phonology as far as it will go and no further. To the extent that it gives good explanations for things, it is useful. To the extent that findings fall outside its assumptions, appeals to other mechanisms (say, Phonetics) will have to be made. I don’t see what’s so hard to understand about that, or why it should generate controversy.

on Nov 11th, 2008NPIs in Russian (are another excuse to compare HPSG and GB)

Yesterday’s reading for Syntax Reading Group was Asya Pereltsvaig’s Negative Polarity Items in Russian and the ‘Bagel Problem’, and it was interesting for a lot of the same reason’s that I enjoyed last week’s paper.

Basically, the so-called ‘bagel problem’ is this. There are two (types of) negative polarity items in Russian that are in complementary distribution - the “ni“-items and the “libo“-items. Of course, being in complementary distribution means that they can’t appear in the same environment, but looking for the environment that conditions the choice turns out to be unexpectedly problematice. On first glance, the generalization seems to be semantic - to wit, that ni-items are “stronger” than libo-items. The disinction being made here betwen “weak” and “strong” basically comes down to how far one is allowed to take the semantic entailments associated with each item, and the difference between “weak” and “strong” items on these grounds is well-established cross-linguistically. “Strong” here means “antimorphic,” something of a complicated term to explain, but here goes.

Something is antimorphic when it meets all of the following conditions:

  1. f(X or Y) implies f(X) and f(Y)
  2. f(X) and f(Y) implies f(X or Y)
  3. f(X and Y) implies f(X) or f(Y)
  4. f(X) or f(Y) implies f(X and Y)

So, for example, never in English sets up an antimorphic context. Each first sentence in each of the pairs below entails the second.

  1. I never sing or dance IMPLIES I never sing and I never dance
  2. I never sing and I never dance IMPLIES I never sing or dance
  3. I never sing and dance IMPLIES I never sing or I never dance
  4. I never sing or I never dance IMPLIES I never sing and dance

Antimorphic, ladies and gentlemen!

Ok, well, a superficial look at Russian leads one to the conclusion that ni-items show up in antimorphic contexts and libo items show up elsewhere (itallicized because syntacticians like to call such things “elsewhere conditions” - that is, “if a certain conditon doesn’t obtain, use me….” kind of thing).

There’s just one problem. It turns out there is one clearly anti-morphic context where you use libo-items instead, contrary to prediction. This is the context defined by the preposition bez (’without’). It’s easy to see that ‘without’ is antimorphic in English:

  1. He showed up without paper or pencil -> He showed up without paper and without pencil
  2. He showed up without paper and pencil -> He showed up without paper or without pencil
  3. He showed up without paper or without pencil -> He showed up without paper and pencil
  4. He showed up without paper and without pencil -> He showed up without paper or pencil

And so it is in Russian too. All these implicatures hold. Unfortunately for the theory, bez can’t be used with ni-items.

Pereltsvaig’s solution is to notice that the only difference she can see between bez and all the other antimorphic contexts where ni-items appear is that bez is clearly never a complementizer at any level of the derivation. It’s clearly not a lexical complementizer, and neither is there any reason to suspect that it ever plays a semantic role as a complementizer (in Minimalist/GB-speak - sometimes called ‘GiBberish’ - of course you would say ‘it doesn’t raise to C at LF’). So once again - as with last week - you seem to have gotten your syntax in my semantics. The distinction between ni- and -libo is semantic, bucept when ni- fails to be a complementizer, in which case you ignore the otherwise-convincing semantic distinction and use libo instead.

Sounds like a job for HPSG!

Of course, Pereltsvaig’s analysis is all about raising things at LF where we can’t see them, i.e. about as far from HPSG as it’s possible to be. But it’s an interesting question all the same. How would an HPSG account handle this?

My first instinct is to say that HPSG can’t handle it, that this is an example of where Minimalism/GB is superior, and that’s because the distinction is positional. But of course, as soon as that hits the screen I realize that it’s only because I’m training in GB that I think of “what’s a complementizer?” as confounded with “where does it show up?” The idea that “complementizer” has a special position all its own is a GB prejudice. In HPSG, where sentence position isn’t explicitly encoded into the theory (rather, it “falls out” from the inventory of features - certain combinations turn out to be illicit), “complementizer” would just be a feature (erm, set of features) like everything else, and there wouldn’t be any talk of where it shows up (and certainly no talk of where it raises to!) at all.

So actually, HPSG could handle this just fine. And in fact, HPSG handles it a little better, because you don’t get into all the complicated (and probably unfounded) speculation about when words are “inserted” into the derivation. Perelstvaig’s solution leverages Distributed Morphology - a theory that includes so-called “late insertion,” whereby lexical items aren’t actually inserted for pronounciation until the deriation is complete. (So ni- and -libo items, on the extreme version of this, might just be different pronounciations of the same item - though most distributed morphology people would say they are different items, and that you insert the more completely specified item - ni, since it’s “pickier” about where it goes - if you can, the other one - libo in this case - if you can’t.) In HPSG such an issue doesn’t even arise since all items carry with them their complete feature set, and there’s no derivation to speak of. There is no debate about when items get “inserted” as they were always just there, and the theory merely tells you whether they can appear in this order or not.

One of the orignal selling points of HPSG was that it bundled syntax with semantics (calling the items over which it operates ’signs,’ the main grammatical feature of which is called ’synsem,” a combination of ’syntax’ and ’semantics’) - a move that seems sensible given that syntactic order is sensitive to and clearly affects semantic interpretation. So this would seem to be yet another plus in the HPSG column. But interestingly, reading this paper gave me some appreciation for LF, and why we might want such a thing.

Basically, that’s because semantics is positional in some sense too. Or, at least it’s convenient to think of it that way when we’re being lazy. I guess “positional” is maybe the wrong word - but we humans are in the habit of marking scope from outside to in when we draw up semantic formulae, and I suppose it therefore seems to us that there’s something positional involved. The rub came with some sentences in the footnotes where libo items appeared in what looked like ni context. Pereltsvaig explained them away by noting that the items in question were D-linked (they have a discourse-level rather than a purely sentential interpretation) and thus in some semantic sense outside the scope of negation. And in that sense - that is, to the extent that we like to think of something as being “outside” or “inside” a scope - LF is actually kind of nice, because it represents the semantics of the sentence in the way we’re used to thinking about it. Things that “scope over” others “raise” above them in the covert movement phase of the derivation. So first you do your syntactic movement - that is, get the items arranged in a proper syntactic heirarchy. Then you linearize (erm…sequentialize - flatten the tree) this to pronounce it, and also read the semantics off it by allowing more determined movements of distinguished items out of the syntactic arrangement. This has the advantage of making a fairly clear prediction: the semantics/syntax interference only really runs in one direction. Synatx can get in the way of the success of semantics, but not the other way around. Syntax is “primary,” and you only worry about whether the interpretation makes sense once the syntax has been satisfied.

In HPSG, all this happens at the same time. And unfortunately I know hardly anything at all about how semantics works in HPSG, so I can’t really speak to any effect where syntax might block semantics or vice versa, or which direction (if any) the influence tends to run. But I did want to note that this is another area where the theories are not “mere notational equivalents,” as some have accused Chomsky of once having called them. How the syntax-semantics interface is handled in each theory is clearly different. In the so-called “Standard Theory,” there literally is an interface, such that the semantic component of the grammar takes a syntactic structure and operates on it. In HPSG, there is no such interface. Syntax and Semantics run in parallel, and failure of either to resolve at any time tenders ungrammaticality.

on Nov 9th, 2008Repent, and ye shall be forgiven

As a kind of followup to yesterday’s post - I notice that today on PrawfsBlog there is a post about whether to use “data” and “media” with plural agreement. Prawfs comes down firmly on the side of singular agreement - which I personally applaud.

‘Media’ and ‘data’ are mass nouns expressing uncountable quantities. When used as such – and they almost always are – they should be paired with the singular form of verbs. That’s my view, at any rate. And I’d say it’s well accepted.

Here’s the hitch. I’d been saying “data is” and “media is” all my life until I got to gradschool, and then I was suddenly surrounded by people who use them prescriptively. It’s gotten to the point that I find myself actually saying “data are” naturally. And by “naturally” I mean not completely naturally because I always do a kind of inner doubletake when I catch myself at it. So it’s definitely something bolted on top of my real grammar, but nevertheless something I say without thinking.

As for “media are,” that’s just British, and I seriously hope that one never slips out (though I do hear it all the time around here).

The interesting question is - how many other people come to academic circles and start saying “data are” just to conform? At what point does it become a critical mass, such that it’s actually native for this register? How many of the people that I’ve picked this up from also started out saying “data is” and then gradually found “data are” constructions slipping out, to the point where now they are, in effect, speakers of a “data are” dialect?

I’m guessing the answer to this last question is “quite a lot.” And I’m guessing that further means that this still counts as a prescriptive change rather than an actual register shift for that reason. And I’m thinking that I’ve just dedicated my life to saying “data is” in polite circles.

As for whether “data are” counts as an example of where prescriptivism helps by making language more precise. I don’t really think so, no. I mean, on the one hand it does, because it tends to emphasize that we’re talking about a collection of data points, each of which is a fact in itself, and so for that reason I suppose one could argue that there is apt to be less confusion about the conclusiveness of data. That is, if you think of data as always and only plural, then you’re less likely to make the mistake of thinking of it as a monolithic thing and more likely to keep in mind that conclusions are abstractions over a mass of evidence. Fine. But on the other hand, any researcher worth his salt should be able to keep this in mind without having to tinker with the language to manipulate himself into it. Fixing this perception is the responsibility of introductory methods classes, not standard English useage. Now, one could argue that it’s nevertheless important that the language lay bare its semantics. And that’s true, up to a point. But there’s also something to be said for grammatical consistency. The truth is that in American English we are largely in the habit of treating collective entities in exactly the way PrawfBlog suggests we treat “data.” We say “Congress is unpopular” without any confusion on the point that some individual Senators may well be popular, in contrast to their colleagues - or that opprobrium may be focused on the House more than the Senate. But the decisive thing here for me is the stiltedness of “datum.” It’s all very well to insist that “data” is plural if you’re in the habit of using its singular form. But so very few people are - and certainly I’m not one of them. Like most people, I feel more comfortable talking about a “data point” or a “piece of data” than I do a “datum.” In fact, I’d go so far as to say that I’ve never used the word “datum” in my life. Worse than that, when I consult my language competence I find that I am capable of saying “datums” - as in “the datums in this cluster are closer together than the datums in that one.” No, it doesn’t sound completely natural and no, I would never, ever actually use that with a straight face. But the point is that it doesn’t sound any more ridiculous to me than just saying “well, one datum that I saw said…” Meaning that “datum” isn’t really a word that I’ve internalized. Meaning, in turn, that I don’t have a singular form of the supposedly-plural “data.” Meaning that when I want to talk about a singular “data” I do it in the way that I normally singularize a mass/collective - by using a countable construction containing that “plural.” Conclusion: “data” is not a standard plural for me, and it is only because I’ve been hanging around pretentious academics that I’ve learned to use it that way.

I REPENT!

on Nov 8th, 2008What’s the Non-Loaded Version of “Crotchety?”

Every profession has its bugbears - those bits of “common sense” that fall into its domain that the public earnestly believes in but which are totally incoherent when examined. For Economists, it’s the make-work fallacy, for Astronomers, it’s the idea that proximity to the sun causes the seasons, for Statisticians it’s likely to be cum hoc ergo propter hoc. For Linguists, of course, it’s prescriptivism.

But click on the link and you will read a good argument that sometimes people take the hunt for Prescriptivists too far.

There are some things that look superficially like prescriptivism but aren’t. One of these is lamenting the loss of a useful distinction. For example, a pet peeve of mine is the incorrect use of abbreviations in footnotes in scholarly writing. All too often nowadays I see v., cf., and viz used as if they all meant “see”… Now, why is my dislike for the conflation of these three abbreviations not prescriptivism? It is because what I decry is not deviation from a standard merely because it is deviation but because it results in the loss of a useful distinction. When I encounter cf. in a recent paper, I can no longer assume that the author is pointing me at a view differing from his own or a study using another methodology. If that is what I am looking for, I may waste a trip to the library. Furthermore, the loss of this distinction is not really a natural linguistic change. After all, the whole system of scholarly apparatus is specialized and artificial. The reason that this distinction is being lost is that those responsible for training scholars have largely ceased to teach it. Students are expected to pick it up, and all too often they fail to pick up on some of the details.

I’m not sure I agree that this isn’t prescriptivism. Specifically, I don’t follow the argument that the fact that a particular linguistic use is “specialized and artificial” absolves it from all such associations. I would have preferred a wording where we acknowledged that this was “prescriptivism” in the broad sense of the term (making normative statements about language use based on a listener’s ideal rather than patterns of popular use), but that it was a permissible example because this is a context where we are outside the normal domain of popular use. Yes, in some sense there is such a thing as a “popular use” of an academic formalism among academics, but using it as a standard is nevertheless linguistically inappropriate because academic discourse was designed with precision in mind and is not meant to be ordinary linguistic communication. By refusing to label anything as “prescriptivist” that is not intended negatively, Poser is, in fact, skating dangerously close to being found guilty of his own accusation: he’s blurring meaning distinctions.

Prescriptivism for me is any time when someone elevates wishful thinking about how language “should” function over evidence of how it actually does function. In most cases, this will be a bad thing for all the familiar reasons. But there are some times when it is not, and Poser’s example of chiding people about “incorrect” use of cf. qualifies as one of them.

An even more intersting example that has been in the news recently: Joe Klein’s quizzical assertion that Palestinians cannot be antisemites:

Here we have the McCain campaign’s execrable Michael Goldfarb slinging around accusations of anti-semitism–a favorite pastime, as we’ve seen this year, among Jewish neoconservatives. I’ve never met Rashid Khalidi, but he is (a) Palestinian and therefore (b) a semite, so the charge of anti-semitism is fatuous. (empahsis mine)

Here’s an example of prescriptivism gone mad, obviously - and yet I think there’s an important point to be made about what Klein is saying.

Clearly, on the face of it, Klein is being ridiculous. In popular parlance, “antisemite” means “someone with an irrational prejudice against Jews.” Being a Palestinian actually makes Khalidi more likely than average to suffer from this malady from a purely statistical point of view. So it’s unfair stereotyping, perhaps, but it’s not “fatuous.” Klein’s assertion here is particularly ridiculous becuse the reader can be reasonably sure he doesn’t believe it himself. People are not commonly in the habit of analyzing the constituent parts of words and using the inferred meaning in all contradiction to the way people around them use them. Such people exist, of course, (Bill Buckley springs to mind), but they are generally ridiculed as pretentious. No, Klein knows he’s making a mistake here - he’s just angry enough at the time of writing not to care, I assume.

Notwithstanding, I think he (unwittingly) raises a legitimate complaint. Namely - there IS some sense in which the word “antisemite” should mean [opposed to] + [semites]. Here’s why I think so.

Language IS a compositional beast. If I give you a new word - say wug - and tell you it’s a verb and ask you to use it in the past tense, you are likely to come up with wugged, and I am likely to agree with you. There is, of course, some debate about whether that’s really a rule application or just by analogy with “hug” and “tug,” but the debate becomes less accuse the longer the word in question. Longer words are generally highly infrequent, and so it begins to stretch credulity that anything other than rule application could be involved. Wugged may well come from hugged and tugged, but wugulforentised? Hardly.

Antisemite itself is a pretty obvious construction from anti and semite - and it helps that anti- is so superproductive in English nowadays that you can apply it iteratively almost without bound (the infamous “anti-anti-anti-missile defense system system system”). This is a word that simply MUST be the result of composition. And so it apparently is. According to the article, it has its origins in German racialist writings of the mid-to-late 19th century. That it rapidly came to be directed exclusively against Jews is simply an artefact of the fact that Jews were common and occupied positions of power in Europe whereas there were few, if any, Arabs about, and what few there might have been would not have been in a position to be seen as politically threatening. But the productivity of “anti-” and the general familiarity with the broader use of “semite” to include not just Arabs but some other races as well means that the compositional meaning of the word is still available in the system. So there is a real tension there. “Antisemite” does mean “anti-jewish,” but it’s still easy for us to see how it could have been otherwise.

I would insist that the key points here are two. It isn’t merely that “semite” is available in its broader meaning, it’s that “anti-” is as productive as it is. Consider another recent controversy - the use of niggardly - meaning “stingy” - which some misinterpret as a having a racist etymology. In fact, it comes from Norwegian and has nothing to do with black people. But the sound association with “nigger” was too much for some people, and so David Howard (a mayoral aid who used the word in a press conference) had to resign his position. Since the word “nigger” is available as a racial slur and since “niggar” sounds like “nigger,” and since “niggardly” seems a plausible adjectival inflection of “nigger,” etymologically uninformed people easily got the wrong impression. In this case that impression was mistaken, but it is important because it illustrates that meaning-building mechanisms for unfamiliar terms do exist. It is because of this that people like Klein are able to exploit the compositionality of “anti-semite” to suggest that it means something other than it does.

So here’s the punchline. I think critique of “antisemite” as a confusing word is also legitimate prescriptivism. I’m not actually advocating that we change the term, of course. What’s done is done - the term exists in its present form and is clearly understood by everyone. What I am saying is that I have some sympathy with people who get fussy about these things as they’re forming. To cite my own personal pet peeve - it irritates me to no end that people call Democrats “liberals” when that term is at odds with how “liberal” is used in Economics. As if it isn’t inconvenient enough that learned people have to juggle two wildly divergent uses of the term in spheres that have a tendency to overlap (discussions of Economics often turn into discussions of upcoming elections), I think political neophytes are actually misled by them. On being introduced to Democrats and Republicans, they ask what each stands for, and a parent, who doesn’t really know, makes the obvious leap of logic and says “well, Republicans are ‘conservatives,’ which means they want to keep things as they are, and Democrats are ‘liberals,’ which means they want to free things up to change.” And if it even stopped there … but it doesn’t. Bill Buckley, on “founding” modern conservatism with the initial publication of National Review in 1955, subtitled his maiden column “standing athwart history yelling stop.” I think it was perhaps a bit too convenient a counterpoint to the then-fashionable Marxist historicism. The Marxists claimed that History was a science, that events were predestined to flow in their direction, and Buckley’s little quip about “standing athwart history yelling stop” then foreverafter confounded opposition to the pace of cultural change with opposition to socialism. They are not the same thing, and intellectual discourse has suffered for it.

Lest anyone think that these examples are only ever political, let me cite Noah’s favorite pet peeve on this front, which isn’t at all. Noah likes to complain that people mix up “linear” and “sequential.” It doesn’t bother me as much, since I don’t work as much with math as he does, but I can easily see the point. In Syntax, when we speak of “linearization” functions, we should really be calling them “sequentialization” functions - because it isn’t so much the lining up of morphemes that matters as determining their order (indeed, to get nitpicky about it, if you face a linebreak then the final words of your sentence will actually precede the first in left-to-right order). If this seems like splitting hairs - well, it is. But academic discourse, as strongly implied by Poser above, is all about splitting hairs. Academic discourse is artificial precisely so that we can speak with more precision than we do in everyday conversations. “Linear” in its ideal definition says nothing about order and everything about relations. The function that converts inches to centimeters is “linear” because for every one inch you increase length, you have increased it by a predictable 2.2 centimeters. The proportion of inches to centimeters never changes, though the quantity of each certainly does. This meaning of “linear” has to do with lines in that any plot of the function will be one. Of course it’s easy to see where the “sequential” use of “linear” came from: sequences are also easily (and therefore frequently) illustrated with lines. The trouble with this analogy is that it was unnecessary. We already had the word “sequential,” and there was therefore no need to sacrifice precision by expanding the coverage of “linear.” It happened, it’s done, and I don’t think Noah actually advocates for “correcting” people on this front, he just finds it all mildly frustrating.

So I think there is a place for what we might call “etymological prescriptivism.” We make normative statements about how people should use certain words in certain contexts on the belief that discourse would in general be clearer if people adopted our recommendations. If there is no need for two words that mean “sequential” and indeed using both of them interchangeably is likely to lead to confusion, then there is a case for moderating at least one’s own speech to try to eliminate the overlap. And it is on this basis that I make a point of avoiding using “liberal” to describe socialists. Since “socialist” is a loaded term, I am polite enough to say “leftist” instead - but the point is that I think political and economic discourse would gain by finding a more convenient way for people to separate classical liberals from contemporary liberals since the two are not of the same philosophy at all. And finally, yes, I think it would be worthwhile coming up with a term that means what “antisemite” originally meant. It’s not, after all, difficult to imagine people who are opposed to both Jews and Arabs, and for the same reasons. Perhaps it isn’t an anthropologically useful category, but there is certainly a political use for a term that means “people of the Holy Land,” regardless of whether they are Jews or Arabs. Certainly some will object that we shouldn’t be in the business of manufacturing politically uncomfortable categories. But I would respond that this is the same dodge that the politically correct crowd uses. Rather than dispute the ideas, they seek to change the terms, with the result that all the attitudes they oppose merely linger beneath the surface. If you want to fight something, it helps to be able to name it.

As for Linguists - I think there are many ways in which the crusade against “prescriptivism” has gone too far. It’s a bit like opposing “goto” statements in Computer Science. It isn’t that it’s not a good idea in most cases, it’s just that it’s impossible to do completely, and so a more open discussion of the topic wouldn’t hurt. I can remember a sly Dan Friedman in class saying “because a function call without any arguments is a goto” to gasps from the crowd, and it was really gratifying. The point was just that you can’t completely eliminate gotos from your semantics, even if you can stop providing the programmer with easy access to them in the way you style your language. Well, so it is with Linguistics. Saying that prescriptivism is always bad disempowers people from employing it in those rare cases when perhaps they should. Certainly it stops them from recognizing it in action in all its forms.

So let me get prescriptive about prescriptivism. I think we could do with a more honest definition of the term - one which means what it means now, but without the negative spin. “Prescriptivism” should go back to being an academic term rather than a value judgment, and people can state their value judgments independently of the term. And then we can employ “prescriptivism” in instances where it perhaps should be employed - for example, in lamenting the collapse of a distinction between socialism and classical liberalism in modern political discourse - or, indeed, in complaining that academics no longer know what all those Latin initials “really” mean.

In short, time for a breath of fresh crotchediness after all this stifling flower power nonsense.

on Nov 3rd, 2008Not Notational Variants (Exactly)

In Syntax Reading Group we’ve been reading Negation in Slavic, a collection of papers on the titular subject. Today’s was The Morphosyntax of Polish Verbal Negation: Towards an HPSG Account by Anna Kupść. It’s an interesting paper because it really hammers home the differences between HPSG and the so-called “Standard Theory.”

There’s an attribution - probably apocryphal - of Chomksy saying that HPSG is nothing but a “notational variant” of “mainstream” syntax. It’s tempting to write this off as either trite or condescending. It’s trite in the sense that any syntatic framework should aim to account for the full range of syntactic phenomena; it’s condescending in that it’s uncharitable to think of theories designed by such intelligent people as Pollard and Sag as motivated by nothing other than notational preferences. In either case, it begs the question why anyone would bother to take the time to make up a theory that adds nothing to the discussion? But stop to pause on that and it immediately occurs to you that in fact no one has ever written a full comparison between HPSG and the so-called “Standard Theory.” The apocryphal Chomsky quote may be on point, for all we know!

So whenever I read HPSG papers I’m constantly on the lookout for things that would clue me in to what the fundamental differences are. Are there grounds for preferring one framework over another, and if so, what are they?

For me the “HPSG question” has always broken down like this. The advantages to HPSG are two: it is not (obviously) directional, and it really only possesses a single mechanism for explanation. The first is nice because humans are consumers as well as producers of utterances. The so-called “Standard Theory” is good at production, but not so much at parsing. Working one way, everything is nicely restricted; working the other way, it’s a hopeless data explosion. The second is nice because it keeps researchers honest. One cannot simply invent mechanisms willy-nilly to account for new findings as is possible in the so-called “Standard Theory.” Everything in HPSG must be explained in terms of feature unification.

The advantages of the so-called “Standard Theory” is only one, but it’s hugely important: GB/Minimalism makes transparent generalizations. With HPSG, everything must be explained in terms of local feature unification, even when it’s not obvious how to do that. In particular, this makes word order and long distance dependence problems a bit problematic. There’s no doubt whatever in my mind that HPSG can capture all of the relevant generalizations, it’s just that several of them require rather elaborate feature specifications of the type that sometimes leave one with the impression that it captured the data but missed the point.

Of course, this subject requires a book-length treatment, hardly the sort of thing that can be handled properly in a blog post. I just wanted to say that the Kupść article linked above is a nice illustration of one area where the two frameworks are not mere “notational variants,” and where I think HPSG is better suited to the data.

The problem with Polish negation is that although everyone agrees that negation is a syntactic phenomenon, it behaves in Polish in some ways as though it were an entirely lexical phenomenon.

For example: some Polish verbs are only ever negative, and others seem to have no negative form at all. For some verbs, the negative particle can be separated from the verb by an auxilliary, for others it may not. This sort of “case-by-case” approach to rules is typical of lexical phenomena. Syntactic principles, by contrast, should be universal.

Without really getting into the details, the analysis in the paper takes the approach that some verbs are prespecified for being negative or not in the lexicon. For these verbs, negation is not really a syntactic but a lexical phenomenon. For all the other verbs, negation is syntactic just as it is in all other languages.

The point is that in HPSG, where all syntax is in the lexicon (in the form of lexical features) anyway, you have a good way to “fudge,” as it were, on whether a phenomenon is lexical or syntactic. It works like this. All words (of any category) come with a head feature NEG. For most words this will be unspecified. However, for some it is specified in the underlying lexical entry. Syntactic rules in HPSG operate by taking two (in the typical case- but sometimes it’s more, sometimes less) items that meet their structural description and unifying with those items. In other words, the rule itself acts like a lexical structure that simply fills in its missing blanks with subordinate structures (which can be words, or other such complex structures, actually). Rules can be made to apply to some items but not others by setting features on a “rule” object in such a way that they won’t unify with certain items. In the crudest case, you could simply make a boolean feature “Applies-to-me?” and set it to + or -, and then set the same feature to + on the rule, thereby excluding any items that were pre-specified as -.

So for Polish negation it’s quite simple, actually. Rules can be set up so that they will only unify with items that are not already specified [NEG +]. Those items that are so specified cannot be the arguments to the rule, and thus will behave differently from items that are unspecified for NEG. In a very elegant way, you resolve your “tension” between the language-independent generalization that negation belongs in the Syntax on the one hand and the hard evidence that some Polish verbs form lexical exceptions to this generalization on the other. Since all Syntax is in the lexicon for HPSG, it is easy to write rules that apply to some words and not others - without losing sight of the larger generalization. This is very cool.

I don’t know what the Standard Theory would even do about cases like this. In that theory, for all its pretensions to being a “lexicalist” approach, Syntax and Lexicon are actually quite separate, and it’s hard to write your Syntax in such a way that it applies to some distinguished items of a class in different ways than it does to other members of the same class.

The point of this is not to advocate for HPSG - though certainly I think this case is a plus in the HPSG column. The point is just to note that the two theories are NOT “notational variants,” that though it perhaps looks like that in a lot of cases, they do represent very different approaches to the study of syntactic phenomena grounded in different priorities, and some syntactic phenomena come more naturally to one theory, others to the other.

Of course, as a Computatianal Linguist, I think all such disputes should be resolved on the basis of which comes with the more tractable implementation algorithms!

on Oct 21st, 2008The Merit in What I Do

One of the reasons that I like the “Computational” part better than the “Linguist” part of my job description is that the “computational” part doesn’t give you any bullshit. A program either runs or it doesn’t, it either gets the result it’s supposed to get, or it doesn’t. And while for very complicated programs it’s not always immediately obvious that it’s not doing what it’s supposed to, it eventually not only becomes clear that it isn’t, but it is always possible, with a certain amount of effort, to explain why it’s failed.

In Linguistics, by contrast, it can be frustratingly hard to separate out the assumptions from the conclusions. And indeed, certain Linguists - infamously including Noam Chomsky - actually take advantage of that truth to avoid criticism. The Minimalist Program is a “Program” and not a “theory,” after all, because it wants to adopt certain assumptions without having to justify them.

I’m actually not opposed to this style of research in principle. In fact, I’m not sure how else Syntax is supposed to operate. As van Riemsdijk and Williams put it in the introduction to their excellent syntax primer:

The material in this book constitutes a detailed and specific theory of grammar. As such, it naturally rests on strong assumptions about the domain of phenomena that the theory of grammar is about, and about the role of the theory of grammar in the general theory of language. These assumptions are supported to the extent that the resulting theory of grammar gives satisfying explanations, and to the extent that it supports or “meshes with” theories concerning other aspects of language.

Right. There is no other way to do Syntax - and that’s a fault of the fact that, again using the words of van Riemsdijk and Williams, “It is by no means obvious that the study of grammar is not an arbitrarily defined subdiscipline most properly dissolved in favor of some combination of studies.” Put another way, while it seems obvious to me that there are syntactic phenomena, it is not perfectly obvious, and for that reason people in my line of work often feel the need to apologize for what they do.

They don’t, of course, actually apologize. What they tend to do instead is internalize these feelings behind walls of dogma, perceiving - largely correctly, in my experience - that they are surrounded by people who think what they do is meaningless.

So it’s nice to read in the Briggs Blog today someone who thinks this is a characteristic of any “scientific” field that approaches the humanities. Quoting the man himself:

The closer a field of study is itself to politics or any area which involves human behavior, the more the consensus acts to keep people in line than it does to promote innovation. Non-consensus ideas are not welcome. Professors holding verboten thoughts are not hired, or if they are found out, they are let go, or they even leave voluntarily, tired of the process.

So it’s not just us. It’s Psychology, and Economics, and Sociology, and All that Jazz too. And he gives a possible remedy:

The solution seems to be, because people in areas which involve humans are prone to ill-informed zealousness, that they should all be taught and consistently reminded that they might be wrong. This is the reason, after all, that, on average, people involved in physical areas are humbler: they have seen and verified their failures, and they have seen and acknowledged that their predictions sometimes are a bust.

I would say that’s actually the lesser half of the story. The greater half is that they know their colleagues have experienced similar failures. One of the things that I noticed about Computer Science culture when I started taking classes in that Department is how much failure professors admit compared to students. Which is to say, a lot relative to virtually none at all. And it isn’t too hard to figure out why: professors are tenured and proven, while students are still in competition with each other. So you get these odd situations where the professors come off looking really dumb, admitting to the suboptimal solutions they originally found to the problems they’re writing on the board, or confessing that they can’t read Java code, or whatever - while the students are busy stretching their hands as high in the air as possible to drop comments about having casually done something last night while messing around that’s known to be difficult. In reality, of course, the professors know the subject much, much better - the difference is just that students don’t feel comfortable admitting failure in public yet because they haven’t seen their colleagues do it.

I think the trouble with Linguistics isn’t that we’re not constantly reminded we could be wrong. Au contraire - Linguists are more brutal about this than people in most fields I know. They LOVE pointing out their colleagues’ mistakes. What’s lacking isn’t the Pennance, in other words, it’s the Priest. We’re constantly casting stones and reminding each other just how wrong it’s possible for us to be - the problem is that there isn’t anything forcing anyone to admit that a blow’s been landed. And so we don’t get the critical mass of examples of colleagues publicly admitting failure necessary to create a comfort zone in admiting failure ourselves. It’s an Economics question, really. When a good is scarce, it’s expensive - when it’s ubiquitous, it’s cheap. If you’re in a profession where examples of failure are “a dime a dozen,” to cash in on the pun, then it costs you nothing. But if you’re in a field where people rarely admit it (because they rarely have to), then the cost of a public confession of failure is quite high, and you think twice about it.

So I don’t think the remedy is reminding people that they “could be wrong.” I think the remedy is finding ways to prove people wrong and employing them mercilessly. There’s that oxymoronic military line about how “we had to destroy the village to save it.” In science, I’m not sure it’s an oxymoron. I think a little bloodletting is actually healthy. It’s sort of the way you have to first train a fighter to take punches before you teach him to avoid them. I think the main problem in humanities-adjacent fields like Linguistics is that people don’t take enough punches, and so they’re so scared of them that they curl into little balls in the corner of the ring rather than getting up and having it out. More accurately, what they don’t realize is that it takes more than a single blow to fell a man. Anyone can take a couple of punches - and in fact you don’t generally get in a position to win a fight without getting close enough that many of your opponent’s punches land. Linguists need to get away from the notion that a single counterexample disproves a theory, that any single punch is going to be a knockout blow.

How to accomplish it? My experience is that the laws of Economics may be subtle, but they are laws. So one thing I know isn’t going to work is direct approaches - like reminding people to remind each other to be humble because they may be wrong. The only way to fix it is to change the incentives, to, as it were, lower the price of failure. And the only way I know of to do that is for there to be a lot more failure about for people to see. I can’t solve it - but I think I can make a contribution. A parser-generator for Minimalism along the lines of the LKB for HPSG will at least realize the possibility that there could be a database of sentences that have been used in syntax papers against which people could test their tweaks to the theory - to see just which sentences that were formerly grammatical are no longer predicted to be under the new version of the theory, for example.

on Oct 18th, 2008Another Reason Why Girls Might Say ‘Holded’

Joshua K. Harshorne and Michael T. Ullman, “Why girls say ‘holded’ more than boys,” Developmental Science 9:1 (2006): 21-32. [PDF]

One of the reasons I feel confident that the pendulum has started its swing back toward symbolic approaches in language research is that the recutionist crowd now regularly engages in all the reckless conclusion by assumption on which they (rightly, in many cases) originally based their criticisms of the symbolic approach. The paper reviewed here is as brazen an example of Asserting the Consequent as one is likely to find in which the authors still bother to collect data.

The overall problem is this: a series of recent studies have shown that in general females outperform males on verbal memory tasks such as recalling words from a list. For this reason, we might expect young girls to overgeneralize less often than boys when producing past tense forms. That is, we might expect that girls would be less likely to produce the ungrammatical holded in place of the grammatical held, and similarly for similar examples. The basis of such a hypothesis is the intuitive belief that regular forms are produced by a rule (e.g. of the form “add -ed to a stem to form the simple past tense), whereas irregular forms must simply be memorized. This is an appealing notion primarily for reasons of memory efficiency: while there is perhaps a performance gain in memorizing frequent regular forms for rapid retrieval (e.g. worked), it seems a waste of brain space to bother explicitly storing multiple forms for infrequent items (e.g. pardon and pardons and pardoned) when the manifest regularity of the lexicon provides such an obvious optimization opportunity.

Nothing, of course, can be asserted without confirmation in science, and the researchers found, in the course of trying to document this assumption, that in fact just the opposite was the case. It seems girls are significantly more likely than boys to produce the overregularized forms, even addressing all the obvious confounds (age of speaker, priming by adult conversation partner, token frequency of use, number of utterances produced, etc.). This obviously poses something of a puzzle. Either the studies showing female superiority in verbal memory are flawed, or the relative inferiority at the task in question is a clue to the mechanism behind female superiority in verbal memory tasks generally.

Taking the second route, the authors hypothesize that if the girls have greater associative memory skills - at least for linguistic forms - they may in fact be producing generalizations on that basis which are extended to forms which should not be generalized. That is, forms like folded get the in way for girls, who have generally greater facility in retrieving them, when trying to produce held than they do for boys.

This yields a testable hypothesis. If such interference is in fact occuring, then it should be predicted by neighborhood effects: items that “sound like” many other regular forms should be more likely to interfere than those that “sound like” comparatively fewer regular forms. Defining “sounds like” gets into controversial territory, of course, so the authors ran tests on three separate interpretations:

Rhyme:
the forms in question rhyme (“sinked” - linked, blinked)
Final Coda:
the forms in question end in the same string (“sinked” - linked, blinked, flunked)
Final Consonant:
the forms in question end in the same phoneme (“sinked” - linked, blinked, flunked, barked)

And since irregulars tend to be monosyllabic, these numbers were calculated twice - once for all relevant forms, once only for monosyllabic forms. In total, six conditions, then.

The correlations were significant and positive for girls in all 6 conditions save one (rhyme measure over monosyllabic verbs - for which it was positive but not significant) - which is to say, girls are more likely to overregularize those irregular verbs for which there are lots of highly frequent similar-sounding regular examples. Boys showed no such correlations at all, let alone significant ones.

There is no reason to raise questions about the data. All of it comes from the widely-available CHILDES Corpus and is therefore easily replicable by anyone who has or is willing to design appropriate software. What is interesting here is not so much what was found but how it is interpreted.

Put crudely, what the authors claim on the basis of these correlations is that girls are generally better than boys at verbal memory tasks, even when they’re not, and that indeed when they’re not it’s because they are. Their greater facility with retrieving stored memory items by association means that girls are more easily confused by regularities in similar-sounding forms (having stored folded and molded, she reasons, on the basis of sound, that there is likely to be a holded, and this short-circuits the retrieval of the dissimilar held). Boys, who either are not as gifted at these sound-based associations or else are simply not as efficient at retrieving their exemplars, are less likely to be misled by their ability and thus are doomed by their verbal inferiority to retrieve the correct answer more reliably. Something is obviously in need of more explanation.

While there is no basis for doubting the data, there is perhaps reason to assume that this interpretation is a bit selective. To get the tiresome and obvious out of the way - yes, there is some evidence for politically correct bias. From page 30:

However, we are not claiming that females depend only on lexical memory for processing complex forms. Even with their excellent memory abilities, females are expected to compose many types of complex forms, including new and lower frequency regulars, and highly complex linguistic representations, including most phrases and sentences (citations).

The word “excellent” stands out here. “Excellent” by what standard? Surely it is the case, as with all such distributions over populations, that there is great variation within groups on level of ability. An individual can have “excellent” memory ability by standing out among his peers; “girls” as a group simply have a somewhat greater tendency averaged over the group to excel in this area than do boys.

But the concern here is not so much with possible politically correct biases as with research biases. Notice the potential explanations we’re ruling out without properly addressing. Foremost is the possibility that it isn’t so much that girls are better at verbal memory as that boys are better at rule-based learning. If indeed rules are in part a method of data compression - the ability to leverage regularities in the lexicon in the reduction of processing load - then the fact that boys store fewer exemplars is a feature and not a bug. “Less efficient at retrieving” (spurious) exemplars may simply be a theoretically-biased way of saying “have internal search engines with greater precision.” Boys’ brains are better-attuned to capturing the real regularities in the lexicon. That’s an important point, since any measure of competence surely concerns itself as much with the appropriateness of the method employed to the task at hand as with realtime performance at that task (though of course the two are related). It’s a bit like calling a computer that evaluates loans on the basis of the applicant’s credit score “equally competent” as a loan adjustor. Given its performance on a certain set of data, it may appear that it is (and given a lazy loan adjustor, it may even be so in a particular case). The reality remains that the computer is using proxy data to approximate the real task. As with credit scores, of course, sound associations are a very good approximation of the actual regularity because they are themselves symptoms of that regularity. The “Final Consonant” condition in that task above, for example, would be explained by a traditional grammarian with reference to productive phonological rules. You know - the past tense morpheme is voiced when it attaches to a stem that ends in a voiced segment, voiceless otherwise, and there’s epenthesis in some cases (or however the official version goes - I’m not a phonologist). Whether or not one believes in such a rule is largely a matter of academic preference, the verdict on which is ultimately dependent on some as-yet-to-be-completed reserach and philosophizing. What is not in question by either side of that debate is that such a rule, if a psychological reality operational in a language, would produce a dataset ripe for exploitation by the kind of “Final Consonant” sound-based associative method described above. One could ape the regularity before he had learned the actual rule in exactly the way that this study suggests that girls have a tendency to do. That the study chooses to phrase its conclusions in terms of a superior verbal ability on the part of girls, rather than a deficit at rule-based learning that is being compensated for by a proxy crutch, owes to a reserach bias that favors reductive explanations. (To see that it is a bias, notice that in the passage quoted above the fact that girls can employ linguistic rules is treated as evidence that they always do. Whereas we are asked to assume that there are differentiated relative levels of ability at associative memory, no such assumption seems to be in play for these authors about rule application.)

Reductionism is a kind of unavoidable disease of science. It is the result of twin concerns, each legitimate in its own right: (1) the need to avoid circularity and (2) the need for a transparent mechanism to underlie our explanations. The first needs no justification: explaining something by naming it is no explanation at all. The second is of course related to the first: we don’t feel that we’ve really understood something unless we can replicate it. Reductionist explanations are often appealing here as a way of capturing noise along with the regularities. But the Turing Test for “thinking” is inappropriate for exactly this reason: testing whether something is human is not the same as testing whether it is conscious; humans have some characteristics that are probably incidental to consciousness per se. If we convince someone that something is human by concentrating on where to build in the pauses/hesitations in its speech, we’ve passed the test without really answering the question. Of course it may turn out that the pauses/hesitations are inevitable consequences of the mechanisms that underly consciousness (in which case modeling them is undeniably useful), but there is no a priori reason to believe so other than assumption. Because of the kind of hair-splitting nonsense that philosophical discussions often produce, I think we’re right to give some weight to operational definitions in science. What concerns me is that they not take the place of real explanation when such is possible. Laboratory word-association tasks are, after all, not real-world linguistic tasks so much as tools for approaching answers to questions about how such real-world tasks are done. It is an error to confound ability on verbal word association tasks in the laboratory with real verbal ability. The one is merely a proxy for the other. The only thing the alternate explanation offered for the phenomenon under discussion - that it is a comparative deficiency in the rule-application abilities of girls rather than in the sound-associative memory of boys that carries the weight of the explanation - has working against it that I can see is that it would tend to associate the bearer with a currently-unfashionable belief in a mental “rule application” mechanism of symbolic flavor. What it has going for it, of course, is that you don’t run into the absurdity of claiming that evidence of one’s lack of linguistic ability shows just how good at language she really is…

To be fair, the authors do note some of these problems as outstanding issues that will need to be addressed by future research. Noting this has, however, not prevented them for titling their paper “Why girls say ‘holded’ more than boys,” as if this were a question they’d answered decisively. More to the point, nor has it prevented them from talking as though they had done so throughout their paper, tempered only by a single caveat near the end. It’s a classic research error. If P then Q, observe Q, conclude, on that basis, P. It’s a named fallacy, fellas.

on Jun 27th, 2008Talkin’ ’bout my G-g-g-generator

A useful thing for some (other) linguist to do would, I think, be to set up a website for cataloging bad arguments in favor of UG. I’ve just run across a beauty.

I’m writing qualifying papers this summer, some of which are about Syntax, and I thought it might be a good idea to start with some foundational stuff. I never really had a proper foundational course in syntax - for *ahem* various reasons - and most of what I know has been picked up from reading LI and sitting in on discussion groups. There’s a lot of arcana in the field these days, so it never hurts to pick up a textbook and start over … was my reasoning.

So I’ve been flipping through Andrew Carnie’s book, and last night I read the inevitable introductory pro-UG argument. These things are apparently required by Holy Writ of Trade Guild for books on mainstream Syntax. Well, not really - the general argument is that the “standard” approaches to Syntax don’t make sense without UG, which is probably true in some broad sense, but not really in the narrow sense they generally mean. Notwithstanding, every textbook I know starts out with some throwaway rationalist argument that just doesn’t really work. So here’s Carnie’s in a nutshell.

Rather than imagining the trouble inherent in learning a language, which is apparently considerable, we’ll imagine instead someone simply matching sentences with situations. Say, the sentence is the cat spots the kissing fishes, and the child has to match this with a situation (Carnie helpfully provides an illustration of a cat spotting some kissing fishes[sic]).

Her job, then, is to correctly match up the sentences with the situation. More crucially she has to make sure she does not match it up with all the other possible alternatives, such as the other things going on around her (like her older brother kicking the furniture, or her mother making breakfast, etc.).

No objections there. I believe the most quoted statement of this problem comes from Quine, who tells a story of some natives shooting a rabbit and saying “gavagai” and leaving the accompanying white dude puzzled as to whether “gavagai” was the rabbit, or the arrow, or the act of shooting, or some kind of cheer, or … WHAT EXACTLY GORAMIT??? Point being, we’re glossing over some difficult issues blithely saying that kids “hear words in context and pick up their meanings.”

Of course, it’s interesting that Carnie should choose this problem to illustrate in a book about syntax. Surely this is a general learnability issue? I mean, this applies as much to learning words in isolation as it does to learning how to string them together, no? So in that sense it’s kind of an odd retreat to beat.

It gets better.

Let’s make this even more abstract to get at the mathematics of the situation. Assign each sentence some number. This number will represent the input to the rule. Similarly assign each situation a number. The function (or rule) modeling language acquisition maps from the set of sentence numbers to the set of situation numbers. Now let’s assume that the child has the following set of inputs and correctly matched situations (perhaps explicitly pointed out to her by her parents). The x value represents the sentences she hears, the y the number correctly associated with her situation.

And then he gives a table, but let’s make it easier on me typing and just say that 1 gets mapped to 1, 2 to 2, 3 to 3, 4 to 4 and 5 to 5. So the question is, given 6, what do we map it to? Well, you might be tempted to say “6,” but then, foolish mortal, you would have fallen victim to Carnie’s Clever Trap®! In fact, suppose the mapping function isn’t identity, but rather [(x-5)*(x-4)*(x-3)*(x-2)*(x-1)] + x = y. GOTCHA! In this case, x=6 maps to 126. Oops!

And actually, I don’t mean to be facetious. This is quite a good example. The trouble is that it’s nothing specific to syntax or even to language. Yes, it does indeed demonstrate rather nicely a general learnability problem, but how does this imply the existence of UG?

I’m actually quite sympathetic to the idea that at least the foundations of human knowledge are innate, having been pretty soundly convinced of that by Immanuel Kant’s The Critique of Pure Reason when I was 19. Kant’s examples are better, and they crucially deal with general cases, nothing specifically to do with language. The point is that a lot of what we “know” about the outside world is innate, including, for Kant, even the notions that we exist in space and time (yes, I buy his argument there too - but that’s a subject for another post - and probably on a different blog).

This mathematical example is, again, kind of a strange choice for an argument about Universal Grammar, considering it has implications for epistemology in general. In fact, this is a nicer illustration of what Russell called the “Problem of Induction,” in my opinion, than Russell gave himself. (Or, actually, maybe Russell did give this example somewhere else, but I’m more familiar with the famous chicken example from Chapter VI of “Problems of Philosophy”).

Here’s Russell:

And this kind of association is not confined to men … Domestic animals expect food when they see the person who feeds them. We know that all these rather crude expectations of uniformity are liable to be misleading. The man who has fed the chicken every day throughout its life at last wrings its neck instead, showing that more refined views as to the uniformity of nature would have been useful to the chicken.

The chicken’s problem is of course the same as that of the child in Carnie’s example. The child has only ever seen a pairing of a number with itself and thus expects this pattern to continue - but she has the wrong idea about what the pattern actually is. Just as the chicken expects the pattern of its being fed when it sees the farmer to continue - and continue the underlying pattern does, though the chicken was mistaken about the nature of that pattern.

But Russell continues:

But in spite of the misleadingness of such expectations, they nevertheless exist. The mere fact that something has happened a certain number of times causes animals and men to expect that it will happen again. Thus our instincts certainly cause us to believe the sun will rise to-morrow, but we may be in no better a position than the chicken which unexpectedly has its neck wrung. … The problem we have to discuss is whether there is any reason for believing in what is called ‘the uniformity of nature’. The belief in the uniformity of nature is the belief that everything that has happened or will happen is an instance of some general law to which there are no exceptions. The crude expectations which we have been considering are all subject to exceptions, and therefore liable to disappoint those who entertain them. But science habitually assumes, at least as a working hypothesis, that general rules which have exceptions can be replaced by general rules which have no exceptions. ‘Unsupported bodies in air fall’ is a general rule to which balloons and aeroplanes are exceptions. But the laws of motion and the law of gravitation, which account for the fact that most bodies fall, also account for the fact that balloons and aeroplanes can rise; thus the laws of motion and the law of gravitation are not subject to these exceptions.

And I think if Carnie had continued along something like these lines, we would be in a better position. Because this is indeed what we syntacticians are doing. Languages are remarkably the same - though they may not appear so on a superficial glance - the world over in terms of syntactic phenomena. It is indeed striking enough that we would like to, if possible, capture these similarities in terms of universal laws that admit of no exceptions, that indeed explain the superficial exceptions. Where I can’t follow this argument is to the point that these laws arise from some biological specialization for language.

Surely Carnie is falling victim to his own trap here. Granted, a biological specification for language is the superficially most plausible explanation. Laws of gravity are laws of objects, laws of language are things specific to a human-produced communication system, so it seems reasonable to look for their cause in biological specialization. But there is no reason we should necessarily look there. There are plenty of other plausible explanations - most notably that UG phenomena could be explained out of simple biological economy using something roughly akin to our explanation for the fact that highly frequent items are more likely to be irregular (because it’s too much trouble for people to remember exceptional forms for words they seldom use, so they devise rules for the past tense of things like “disarm” but are happy to use “went” as the past tense of a frequent item like “go”). In Carnie’s terms, there is an underlying pattern, but we’ve no way of knowing what it is, exactly.

To the extent that Carnie is merely offering some background about learnability for the purpose of advancing UG as a plausible working hypothesis, he’s on solid ground and I support him. This would be something similar to Pinker’s defense of the idea that research should be done into innate cognitive gender differences, even as Pinker himself remains uncommitted as to whether there are such things. Unfortunately, that this is not all Carnie’s doing is made clear by the transition to the next section:

The evidence for UG doesn’t rely on the logical problem alone, however.

Just like that - as though he’d even bothered to present any logical problems that related to language learnability exclusively, i.e. that were not general epistemological problems of knowledge of the outside world in general.

These sorts of things are risible to me since I don’t really see the need for a biological UG to justify the study of syntax in the first place. The fact is, there are syntactic regularities, and this can easily be demonstrated by appeals to the students’ own native intuitions about the classroom language. Where those regularities come from, ultimately, is an interesting question, but it is not a question for syntacticians. Psycholinguists (and neurolinguists, for that matter, to the extent that there really is a language-specific UG) are much more qualified to address those issues than we are. Our job is merely to model how the system works, to describe those regularities that require careful attention to uncover. That these are numerous and subtle enough to justify a field of inquiry has been amply demonstrated over the last 40-50 years to anyone who cares to glance through LI. The “learnability problem” really isn’t, or shouldn’t be, our main preoccupation.

In particular, given the multiplicity of possible explanations for how children acquire the subconscious knowledge of their language they acquire, both proposed and yet-to-be-proposed, and given the highly specialized psychological or information-theoretic or biological knowledge that will be necessary to adjudicate between them, it seems silly to ask people armed only with Chomsky to pronounce an opinion on the subject one way or the other. Certainly biological UG is a plausible working hypothesis, but it is only one of many, and we just don’t have the information before us to venture much more than a guess as to the nature of UG at this point.

Now, I did say that Carnie continued on in another section. So maybe there are better arguments there?

No such luck, actually. Carnie’s next item for consideration is the that-trace effect. Given this pair of sentences:

(a) Who do you think that Ciaran will question first?
(b) Who do you think Ciaran will question first?

a reasonable conclusion for a learner is that complementizer “that” is simply optional. A further addition to the dataset seems to confirm it:

(c) Who do you think will question Seamus first?

So again, we have one of “these pattern things.” The underlying pattern appears to be that you can simply omit “that” if you feel like it. But then along comes something that causes us to question this conclusion:

(d) *Who do you think that will question Seamus first?

Mysteriously, in English, complementizers are prohibited when it’s the subject that’s been extracted. How do children learn this? After all - it’s hugely implausible that they’ve ever heard sentence (d) and been told it’s wrong. Clearly ungrammatical sentences of this kind generally fail to be produced at all. So how do they learn it? Must be UG, right?

Well, again, maybe. But equally plausible seems to me that whatever subconscious model of language they’ve formed from the sentences they’ve heard simply predicts this for them. To cite an example from my own life - when I first learned to play Shogi (Japanese chess), no one told me the rule that you can’t have two pawns on the same file. In Shogi, you see, you can drop pieces that you’ve captured onto the board in place of making a move (so capturing pieces is more like “converting” them to your side). But there’s a ban on using this feature to simply line up masses of pawns on the same file. No one told me that specifically, but when my opponent did it, I turned to the person teaching us the game and asked if it was legal. Surprised, he said that it wasn’t, and then asked if I’d ever played Shogi before, since he could think of no other explanation for how I “knew” that. But I hadn’t played before - it just “felt wrong” is all. Point being, I think it’s really hasty to rush to conclusions about language-specific innate brain modules just because children are able to generalize from the pieces of the system they’ve acquired so far to pieces no one has explained to them. Clearly, there is some kind of innate reasoning ability over systems and rules, but it doesn’t have to be specific to language, and it certainly doesn’t have to be as specific as a parameter setting for the that-trace effect. Again, I would stress that the job of the syntactician possibly includes raising these questions, but definitely not answering them. We simply describe the system in as much detail as we possibly can - and maybe (hopefully!) something about the regularities we uncover will give psychologists and biologists a clue as to how children do whatever it is they do.

I don’t want to be too critical of Carnie’s book in general, I should add. There are some sections of the first chapter that I really appreciate - particularly the boxes on pp. 10 and 12 responding to common criticisms about the existence of rules and the validity of basing a science on “intuitions.” From the former:

… a brain is a mass of neurons firing, how can formal mathematical rules exist up there? Remember, however, that we are attempting to model Language, we aren’t trying to describe language exactly. … Obviously the rules don’t exist, per se, in our brains, but they do represent the external behavior of the mind.

Quite correct. There is a certain subspecies of phonetician/cognitive scientist (the kind that likes to refer to itself as a “language scientist”) that seems congenitally unable to grasp this point. Syntax is a model, not the final physical explanation. Grammaticality exists as a phenomenon in the world, and we try to explain its operation in as concise a way as possible. Entirely too many first-year graduate students in Linguistics come away with the idea that Syntacticians really honestly believe that there are explicit trees in our heads and that “movement” is a genetically-specified neurological operation. Poppycock, obviously, but because it is obviously poppycock and because they somehow form the impression in spite of us that we literally believe this, it’s easy to understand why we’re so often the objects of their ridicule. I appreciate that Carnie’s book takes the time to refute this view clearly. It shouldn’t have to, of course (responding to straw mans isn’t really in an intro textbook’s job description, after all), but it does everyone a service by recognizing the need and addressing the issue anyway.

I just wish we could drop all the talk about UG. Yes, it’s there in some sense, but it’s like Global Warming, really. We know it’s happening, but we don’t know to what extent, what the implications are, or even exactly what the mechanism is (there’s still debate as to the extent of human complicity), let alone what to do about it. Yes, there’s some sense in which there are universal grammar rules for all human languages. But we don’t know exactly where this comes from, on what level it operates, or even how pervasive it is in the real explanations for these similarities. So let’s stow it, please, until people qualified to address these questions can do so.

on Jun 7th, 2008Unintentional Self-Parody

An actual quote from a blog about language:

The fact that “one of the only” is a common phrase, found everywhere, does not make it acceptable English.

So what, one wonders, would make anything “acceptable English” for these people? Is there any mechanism for making something “acceptable English” OTHER than the fact that it “is a common phrase, found everywhere?”

Alright, granted, if you split the bits apart, then it doesn’t make semantic sense for “one of” to be used with “only,” since “only” is only semantically compatible with single units, and “one of” implies plurality. I get it. But neither does “kick the bucket” have anything obvious to do with death, and yet I’m in the habit of parsing it that way rather than in its potential literal meaning. I could rattle off hundreds of similar examples, but why bother? Anyone with even a passing knowledge of Linguistics is aware of chunking, aware of idioms, and bloody well aware that native speaker intuitions are the measure of grammaticality in any language, not dusty semantics books about “the way things ought to be.”

Now, I’m not as dedicated an anti-prescriptivist as most linguists. I can see the case for lamenting a particular form if it destroys or blurs distinctions previously present in the language. One of my personal pet peeves on this score is the modern habit of using “utilize” as though it meant “use + I’m intelligent,” which it doesn’t, or didn’t used to. And I can see the case for lamenting use of a particular form if it is in some way deceptive, designed to cause false associations. Like calling socialists “liberals” when they stand for the opposite of relaxing controls on an economy. But in each of these cases the reason I am sympathetic to presciptivists is because there is something to be gained in the space of semantic coverage by listening to them.

I simply can’t see what is to be gained by picking nits about “one of the only.” It’s a chunk. An idiom. We know what it means without having to open the hood and tinker with the bits. There is no semantic distinction that’s being blurred, and the standard-use meaning of “only” is not under any threat. Indeed, it is ironically partily the contrast with “one of the only” that sees to this. If I say “I’m the only one who passed,” then it’s clear to you to that no one else but me passed the test in some part because I didn’t say (but potentially could have said) “I’m one of the only ones who passed.”

One could argue, if one were excessively silly, that “one of the only ones” imposes a cognitive burden because of a garden path effect whereby the listener must go back and reparse. (And I’m equally sure there’s a researcher with access to an fMRI gadget and too much government money willing to take colored pictures in red and blue of your brain and pronounce you CORRECT.) But this burden can’t be terribly large. Indeed, I’ve never been in the presence of any ambiguity caused by “one of the only.” It’s more the kind of thing that Gallagher would pick at for a cheap laugh. (You know, Mr. “we park on a driveway and drive on a parkway! It’s messed up!” Yuk yuk.)

So this is unintentional self-parody of the first order. Awesome.