Arvind Mahankali won the 2013 National Spelling Bee on "knaidel," a type of dumpling. The winning word the year prior was "guetapens," before that "cymotrichous," and before that "stromuhr." Have bee-winning words always been this insane, or is this a recent development?
It's tough to measure how hard a word is to spell, but we can measure how rare the word is, using Google Ngram. If you're not familiar, Google Ngram searches a corpus of digitized books to see how word use changes over time, as a percentage of all words in the corpus. We looked up the Ngram usage for all 86 winning words (lower, upper, and title case combined) for the year prior to the bee; you can find our full results below.
There are lot of potential pitfalls in using Ngram results, but one concern is the data sample Google has access to isn't really representative of word usage in the wider culture. In particular, scientific and academic texts may be overrepresented; words like "asceticism," 1929's winner, may appear surprisingly common for this reason.
Here are the Ngram usages for all of the winning words, from 1925 to 2013. Words closer to the top are more common:
"Therapy" easily takes the top spot; it's widely used nowadays, but it was still plenty common 1939. (Also, it's not that hard to spell.) The rest of the list is below, sortable by usage or date:
That top chart makes it clear what the anomalously common words are, but everything else gets crushed to the bottom, so it's hard to see if there are any trends. To work around this, we can put the same data on a logarithmic scale (like the Richter Scale):
Now that we have some breathing room, we can see that there seems to be a general downward trend, implying that winning words are becoming rarer. In particular the last 20 years or so have seen a serious dip; this is more obvious if you add a LOESS curve to the chart. Of the 15 rarest winners in bee history, 11 have come since 1995:
Rank | Year | Winning Word | Usage/100 billion (prev. year) |
T-1 | 1955 | crustaceology | 0 |
T-1 | 1962 | esquamulose | 0 |
T-1 | 1980 | elucubrate | 0 |
T-1 | 2002 | prospicience | 0 |
T-1 | 2011* | cymotrichous | 0 |
T-6 | 2012* | guetapens | 5 |
T-6 | 2013* | knaidel | 5 |
8 | 2010* | stromuhr | 15 |
9 | 1996 | vivisepulture | 24 |
10 | 1999 | logorrhea | 53 |
11 | 1998 | chiaroscurist | 67 |
12 | 1995 | xanthosis | 87 |
13 | 1997 | euonym | 103 |
14 | 1979 | maculature | 115 |
15 | 2007 | serrefine | 137 |
*Google Ngram only provides data through 2008, so the frequency of the 2010, 2011, 2012, and 2013 winners are all based on data from 2008. These words were all extremely rare in 2008, but I have no reason to think that their usage has appreciably changed in the last few years.
Those five top words stumped Google Ngram, and didn't appear in the corpus at all in the year prior to the bee. "Esquamulose"—meaning "not covered in scales"—wins rarest word, period; it doesn't appear in the corpus for any year, from 1800 to 2008 (a word has to appear in at least 40 books to show up in the database).
Appropriately, "esquamulose" was only a winning word in a technical sense. The 1962 contest ended in a tie after both finalists failed to spell it correctly, the first and only time that the bee has ended in such as way.