Saturday, June 13, 2009

Crossing the threshold of hype

If you watched the video in the last post and you follow reputable language bloggers, you could probably guess that what caught my attention was Craig Crawford's acknowledgment of the Global Language Monitor's claim that English now has one million words.

But acknowledgment doesn't sound right to me. How about 'Crawford's duped acknowledgment...'? That's more like it.

I've written about Payack before, and so have the important language bloggers. The arguments haven't really changed, but they're worth repeating.

A relaxed definition of word would easily lead to several million words in the English language. At any point that you decide to limit the definition of word you've got an argument to make. Do we count rock and rocks as separate words? How about mouse and mice? How about the different tenses of verbs?

Once we get past such grammatical distinctions we have the hard part. Certainly teeter-totter and seesaw and hickey-horse should be counted as words distinct from each other, but what about potato bug referring to the Jerusalem cricket and potato bug referring to the woodlouse? Is potato bug1 distinct from potato bug2?

And then we have words like tubular which used to mean resembling a tube then during my childhood I learned it as cool, far out, groovy, outasight. One word or two? Does the second meaning even count as a word? How do we count slang?

How about ginormous? Fucktastic? Krunk? Bevemirage? The arguments about what is and what isn't a word immediately dissolve Mr Payack's claims that on June 10, 2009 at 5:22 GMT the millionth word entered the English language. The only way this determination is even theoretically defensible is if Payack and and his algorithm were able to account for ever slang word and every bit of jargon and every portmanteau and sandwich word and regionalism and simply say when you count everything without argument about what should be counted, there are X words known to and used by English speakers.

And that's only theoretically possible. And the count would be many times what Payack says it is. Especially if phrases like "wardrobe malfunction" are counted as words. How about other compositionally predictable items like "terrorist attack" or "computer program"? If they occur together enough, are they single words in addition to the individual words they comprise?

But he claims that his number is only an estimate and it's meant to celebrate the globalisation of English. We already know that English is global and we could have celebrated it a long time ago. And there's no reason to celebrate the threshold now just because he has marked the date.

According to a barely skeptical story

[Payack's] computer models check a total of 5,000 Web sites, dictionaries, scholarly publications and news articles to see how frequently words are used, he said. A word must make 25,000 appearances to be deemed legitimate.

So it's a late celebration if we decide a word needs 10,000 appearances from 10,000 sources. And it's a very early celebration if we decide 30,000 appearances on 2,500 sources is necessary. And that is if we agree on a standard of word-form count.

Craig Crawford's home turf is CQ Politics, not Language Log or Visual Thesaurus. So we can't expect his bullshit sensor to be as well-tuned on issues of lexicography. But there is a tendency to believe a sparkly press release merely because it would be cool for it to be true. And the coverage of Payack's pronouncement has been more eager than investigative. The linguists are usually included as mere dissenters: stingy academics stifling the entrepreneurial spirit. There are exceptions.

A BBC4 segment pitted Payack against Ben Zimmer on level ground. With the opportunity to speak plainly in response, Zimmer shut down the claims pretty easily. When PRI's The World reran the story the silliness of such claims was pushed even further to the fore with David Crystal's reasonable voice adding some lovely and firm criticism.

The relevant segment takes up the first 10 minutes.

Even the host, Patrick Cox, speaks with a clearly dismissive tone, not just of Payack, but of the headline writers who were "the only people who seemed to like the story and the declaration."

Bravo Mr Cox. Bravo.

1 comment:

  1. In these sorts of discussions (define 'a word'), I think it's always better (safer?) to be more inclusive than exclusive - like you were saying, 'horse' and 'horses' should be considered two different words, and so on.

    I say 'should be,' but in linguistics there is very little room for 'should-be's.' Hey! I just invented the 1,000,0001st word - 'should-be!'

    Thanks for another fucktastic post.


Thanks for reaching out.

You can also contact me at wishydig[at]gmail[d0t]com.