With Markov chain based random word generation, I essentially have tables of the probabilities for letters sequences. With this I’ve always wanted to know what the most English word was. The word with the highest probability of each letter following its predecessors.
I finally bit the bullet and produced it; well them, because it varies depending on the corpus & depth used. All in all it’s not that impressive, just kind of cool to know. I don’t know what I was expecting, some amazing word that would rock my socks off.
Without further ado, here they are:
Corpus | Depth | Wordiest Word |
basic_english_words | 1 | st |
basic_english_words | 2 | st |
basic_english_words | 3 | struction |
basic_english_words | 4 | statement |
basic_english_words | 5 | store |
unabridged_english_dictionary | 1 | prerererererererere… |
unabridged_english_dictionary | 2 | press |
unabridged_english_dictionary | 3 | press |
unabridged_english_dictionary | 4 | preconcer |
unabridged_english_dictionary | 5 | preconcertification |