Friday, October 30, 2009

Initial letters in the news

It's not exactly anacrophony , but the initial letters of words are in the news, thanks to Governor Schwartzenegger's recent veto message. That's close enough to the topic of this blog to make a comment!

In all the news reports, there are various estimates for the probability that Schwarzenegger's message happened by chance. Let's look at a very restrictive version of the question: How likely is it that the initial letters of seven words taken at random from English language text will spell out ‘f— you’?

The numbers that have appeared in the press are questionable. Even in those cases where people have taken into the fact that some letters are more common than others, they haven't taken into account that these are initial letters.

So what are the frequencies at which letters appear as the initial letters of words in written English? I had no idea. So I downloaded plain text versions of two books from Project Gutenberg and analyzed them. The two books were Phineas Finn by Anthony Trollope and Following the Equator by Mark Twain. After some editing (to keep ‘Chapter XXI’ from adding to the count for the letter X, for instance), I saved the books as .txt files and ran the following command (all on one line):

cat *.txt | tr "[:lower:]" "[:upper:]" | awk '{for(i=1;i<NF+1;i=i+1) alpha[substr($i,1,1)]+=1} END {sum=0; s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; for(i=1;i<27;i+=1){sum+=alpha[substr(s,i,1)]}; print sum; for(i=1;i<27;i+=1) {c=substr(s,i,1); print c " " 100*alpha[c]/sum}}'


I found that the letters C, F, K, O, U, and Y appeared at the beginnings of 3.6%, 3.5%, 0.7%, 5.9%, 1.0%, and 1.5% of the words in these books, respectively. Using these figures, it appears that the likelihood that the seven letters in Schwarzenegger's veto appeared by chance is about 1 in 1.3 trillion.

State government letters are neither Trollope nor Twain, but these numbers are certainly better than saying that the probability is 1 chance out of 267.

Saturday, September 19, 2009

Introducing the NAPA

Welcome!

After weeks of preparation, the Nearly Anacrophonic Phonetic Alphabet is ready to be presented to the public. If you haven't already, visit the NAPA home page and the NAPA design page to find out what this is all about. And please leave comments with suggestions for betters words.

Again, welcome!

Guestbook

Welcome to the NAPA guestbook. Leave general comments here. If you have a suggestion about an individual entry of the NAPA, it might be more appropriate to leave it as a comment on the post of that entry — see the sidebar for the correct link.

Tuesday, September 15, 2009

The letter A

The code word for the letter A is aural.

A more common homophone: oral.

The letter B

The code word for the letter B is bdellatomy.

Bdellium would be another choice, but the OED lists it as non-naturalized.

The letter C

The code word for the letter C is ctenoid.

Other choices: channukah, czar, or the remarkable cnicnode. This last was coined by a mathematician, and seems less natural than ctenoid.

Unfortunately, the OED indicates that chthonic should be pronounced with an initial k sound.

The letter D

The code word for the letter D is djinn.

A homophone of gin.

The letter E

The code word for the letter E is ewe.

A homophone of you. Other options for E include the various eu- words.

The letter F

The code word for the letter F is fantasm.

This is one of the three words that makes the NAPA only nearly anacrophonic.

The letter G

The code word for the letter G is gneiss.

A homophone of nice. Other choices for G include gnu, gnat, and gnome.

The letter H

The code word for the letter H is heir.

A homophone of air.

The letter I

The code word for the letter I is ingénue.

Other possibilities for I include irk and Iatmul.

The letter J

The code word for the letter J is jipijapa.

The OED includes fewer words than you might expect that have an initial J pronounced as H. I chose this one, instead of a more common word like jicama, because it's so fun.

The letter K

The code word for the letter K is knead.

A homophone of the more common need.

The letter L

The code word for the letter L is llareta.

The letter L is the only letter for which I had to resort to Webster over the OED. If you have any better ideas, please post a comment!

The letter M

The code word for the letter M is mneme.

The letter N

The code word for the letter N is ngoma.

The letter O

The code word for the letter O is oneing.

Other options include Oaxaca, oenology, and one. It's tempting to have oenology be a word in the NAPA, but oneing won out.

One was a contender, and it has the advantage of being a homophone, but it is more common than its homophone won, and that defeats the whole purpose.

The letter P

The code word for the letter P is pteris.

This is a homophone for terrace. There are many other P words to choose from, including the various ph-, ps-, and pt- words, but the only other homophone I could come up with was psylla, which is homophonic to Scylla.

The letter Q

The code word for the letter Q is qi.

The letter R

The code word for the letter R is rath.

This is a homophone of wrath. It's another example where I had to resort to acrophony.

The letter S

The code word for the letter S is segar.

The third acrophonic word in the NAPA. I considered sgraffito and seidel, but segar seemed best.

The letter T

The code word for the letter T is Tlingit.

The letter U

The code word for the letter U is uakari.

The letter V

The code word for the letter V is voetganger.

The letter W

The code word for the letter W is wrest.

Homophonic to the more common rest.

The letter X

The code word for the letter X is Xhosa.

Xi (the Greek letter) would be another possibility, and the OED's first pronunciation starts with an /s/ sound. But there are too many alternate pronunciations to make it a comfortable choice.

The letter Y

The code word for the letter Y is yttric.

The letter Z

The code word for the letter Z is zwieback.