Sunday, October 05, 2008

Not worth investigating. But...

Then maddening antics of Paul Payack have caught the attention of CNN. (We know Payack from his claims about the English word count. He's still making ridiculous claims about a vocabulary size that he thinks is worth giving a specific number. That's like counting the teaspoons of water in a rainstorm. How would you decide what to count and how in the world would you do it?) His latest gimmick is commentary on the debates. The CNN story reports:

An analysis carried out by a language monitoring service said Friday that Gov. Sarah Palin spoke at a more than ninth-grade level and Sen. Joseph Biden spoke at a nearly eighth-grade level in Thursday night's debate between the vice presidential candidates.

Yes numbers of that sort are possible to produce. Most word processors have some sort of calculator. But here are some red flags.

Payack says nothing about what grade level means. The story then jumps immediately from grade level discussion to passive/active voice statistics. Then it goes to a simple word count. Which is really pretty meaningless in a timed back-and-forth exchange. Payack counts words per sentence which is problematic because of the disfluencies and truncated sentences that always occur in speech. Do you count a repeated word twice? Do you count truncation and restructuring as a single sentence or as two (or more) separate sentences? Payack also offers a number for ease of reading. For speech. Have you ever tried reading a faithful transcription? Few people have much experience reading extended passages of faithfully transcribed speech. It's choppy. It's full of repetitions and ungrammatical segments. So how is ease of reading determined? Payack bases all this (with some sort of modification) on the Flesh-Kincaid formula which, according to Wikipedia, would rank a single monosyllabic word as the easiest possible reading. On Payack's scale 100 is the easiest to read or hear. The Flesch-Kincaid system puts 121 as the easiest. I don't need to investigage at the Flesh-Kincaid formula. Even if it is legitimate I trust that Payack knows how to butcher it for his own benefit. Lastly, evidence that Payack found a hammer and thought everything looked like a nail: he gives us a statistic for number of sentences per paragraph.

Now during that last paragraph I was struggling with the issue of when I should break it apart. I promised a list of red flags and because those are all red flags with a tiny bit of discussion, I decided to keep it intact. But I might have split it into smaller more manageable sections for ease of reading. I probably should have. When speaking we don't do anything like that. There are no paragraphs in oral language. Yes, there are changes in direction and occasional obvious changes in topic or approach. But the paragraph is a writing convention that has no hard correlation to a structure in discourse. At least not anything that's worth attaching a number to.

These habits of statistical assurance make me wonder: Does Payack like naming every bird that he hears flying outside his window?

† I do have some reservations about the system. Ranking any two sentences it rates every shorter sentence as easier as long as the average syllable length of the words in each sentence is the same. Just that claim is worth its own post. Further, each Wikipedia page on Rudolf Flesch, J. Peter Kincaid and their formula, is full of unsupported and biased claims. One example: the article on Flesch includes the following passage.
Flesch practiced what he preached. His writing is clear, vigorous, and plain; his style is direct and energizing. Those who read How to Write Plain English often comment that his writing motivates them to write more plainly. For example, here is Flesch on clearing up legalese:

The shill who wrote that then provides a sample (that I don't need to include) of Flesch's writing that doesn't exactly actually address clearing up legalese. It's mostly a complaint against the view that complex ideas need complex language. It's a valid complaint. But it's not really a clearly written one. I would hope it's not Flesch's best work.


  1. about paragraphs in speech: Some guy at work just informed me that he thinks about how he would punctuate anything he says.

    I cannot conceive of that actually being true, regardless of whether he really thinks it is.

  2. There you go with politics again! ;)

  3. For me, the most interesting thing about the vice presidential candidate's speech habits is how analysts treat the colloquialisms favored by Palin. I've also noticed that she shares a lot of verbal habits with the current president but receives an entirely different kind of criticism and/or praise. Just some observations!

    Word Watcher

  4. Did you see the post at Motivated Grammar about this? He makes some good points about how you decide what is a compound sentence and what's two or three simple ones, in speech, and how that affects the score. One of his examples:

    “I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant.

    That is a hard sentence to understand, not because any of the words are difficult, but rather because the syntax is extremely complex. This is reflected by its grade level, which according to Google Docs is 9.0. But if we split the sentence in two by replacing the comma with a period, the grade level plummets to 3.0, because each of these two sentences is short, with short words."

  5. I saw that fine piece. And it's precisely the type of follow-up post that I hinted at in my footnote.

    (link to the thoughts at Motivated Grammar)


