Saturday, November 7, 2009

Popularity of Words for Numbers

I came across a list of the 15,000 most common words in the English language. The list is from the British National Corpus (BNC), a 100-million-word compilation of spoken and written (British) English from the late twentieth century. The top 15,000 words each had a word frequency, the count of how many times the word appeared in the BNC.

Given such a nice data set, I couldn’t resist asking it a quick and fun question: Do the words for numbers rank the same as their numerical order? That is, would one be more frequent than two, and two be more frequent than three, and so on?

For one through nine, the answer is yes. However, ten through twenty is a different story.

Because ten through twenty includes words with very low frequencies relative to one, let’s redo the above chart with a logarithmic scale for frequency. Now we can better see the relative differences of the lower-frequency words.

Ten, twenty, and fifteen are roundish numbers, so we shouldn’t be surprised by their breaking the pattern. We can also give twelve an exemption because of its prominence in various units (twelve months to the year, twelve inches to the foot, and so on).

Excluding those, eleven and thirteen continue the pattern established from one to nine. But then fourteen and sixteen go the wrong way, exceeding thirteen in popularity. Seventeen gets back in line, with frequency less than thirteen.

Is it only the prime numbers that can remain well-behaved teens? No, the next prime, nineteen, has a higher frequency than not just seventeen but also thirteen and eleven. I suspect that nineteen’s popularity stems from its use in dates, which might also explain eighteen’s place.

I could go on, but suffice to say, multiple factors are at play. So, in the name of stopping while this exercise can still be classified as “quick and fun,” I hereby stop.

For those in need of even more obscure numbers about words, I direct you to The Prime Lexicon, a list of words that are prime numbers when expressed in base 36.

No comments:

Post a Comment