Friday, October 30, 2009

Initial letters in the news

It's not exactly anacrophony , but the initial letters of words are in the news, thanks to Governor Schwartzenegger's recent veto message. That's close enough to the topic of this blog to make a comment!

In all the news reports, there are various estimates for the probability that Schwarzenegger's message happened by chance. Let's look at a very restrictive version of the question: How likely is it that the initial letters of seven words taken at random from English language text will spell out ‘f— you’?

The numbers that have appeared in the press are questionable. Even in those cases where people have taken into the fact that some letters are more common than others, they haven't taken into account that these are initial letters.

So what are the frequencies at which letters appear as the initial letters of words in written English? I had no idea. So I downloaded plain text versions of two books from Project Gutenberg and analyzed them. The two books were Phineas Finn by Anthony Trollope and Following the Equator by Mark Twain. After some editing (to keep ‘Chapter XXI’ from adding to the count for the letter X, for instance), I saved the books as .txt files and ran the following command (all on one line):

cat *.txt | tr "[:lower:]" "[:upper:]" | awk '{for(i=1;i<NF+1;i=i+1) alpha[substr($i,1,1)]+=1} END {sum=0; s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; for(i=1;i<27;i+=1){sum+=alpha[substr(s,i,1)]}; print sum; for(i=1;i<27;i+=1) {c=substr(s,i,1); print c " " 100*alpha[c]/sum}}'


I found that the letters C, F, K, O, U, and Y appeared at the beginnings of 3.6%, 3.5%, 0.7%, 5.9%, 1.0%, and 1.5% of the words in these books, respectively. Using these figures, it appears that the likelihood that the seven letters in Schwarzenegger's veto appeared by chance is about 1 in 1.3 trillion.

State government letters are neither Trollope nor Twain, but these numbers are certainly better than saying that the probability is 1 chance out of 267.

Saturday, September 19, 2009

Introducing the NAPA

Welcome!

After weeks of preparation, the Nearly Anacrophonic Phonetic Alphabet is ready to be presented to the public. If you haven't already, visit the NAPA home page and the NAPA design page to find out what this is all about. And please leave comments with suggestions for betters words.

Again, welcome!

Guestbook

Welcome to the NAPA guestbook. Leave general comments here. If you have a suggestion about an individual entry of the NAPA, it might be more appropriate to leave it as a comment on the post of that entry — see the sidebar for the correct link.

Tuesday, September 15, 2009

The letter A

The code word for the letter A is aural.

A more common homophone: oral.

The letter B

The code word for the letter B is bdellatomy.

Bdellium would be another choice, but the OED lists it as non-naturalized.

The letter C

The code word for the letter C is ctenoid.

Other choices: channukah, czar, or the remarkable cnicnode. This last was coined by a mathematician, and seems less natural than ctenoid.

Unfortunately, the OED indicates that chthonic should be pronounced with an initial k sound.

The letter D

The code word for the letter D is djinn.

A homophone of gin.