The Wordiest Word

With Markov chain based random word generation, I essentially have tables of the probabilities for letters sequences. With this I’ve always wanted to know what the most English word was. The word with the highest probability of each letter following its predecessors.

I finally bit the bullet and produced it; well them, because it varies depending on the corpus & depth used. All in all it’s not that impressive, just kind of cool to know. I don’t know what I was expecting, some amazing word that would rock my socks off.

Without further ado, here they are:

Corpus Depth Wordiest Word
basic_english_words 1 st
basic_english_words 2 st
basic_english_words 3 struction
basic_english_words 4 statement
basic_english_words 5 store
unabridged_english_dictionary 1 prerererererererere…
unabridged_english_dictionary 2 press
unabridged_english_dictionary 3 press
unabridged_english_dictionary 4 preconcer
unabridged_english_dictionary 5 preconcertification

PHP file upload to a Google storage bucket

Download

bucket_upload_1.0.php.gz

Google setup & use

1- Create a storage bucket for the script to upload into

 

Go to the Google Cloud Console, click on “Storage”, “Browser”.

 

“Create Bucket”

 

Give it a name and click “Create”.

 

2- Create a service account for the script

Expand the “IAM & admin” section, click on “Service accounts”.

 

Click “Create Service Account”.

 

Give it a name, check “Furnish a new private key”, JSON, and click “Save”.

 

Save the JSON credentials file which you are prompted to download into a safe location.

3- Grant “Object Creator” permissions on the bucket to the service account

Go back to the storage bucket you created

 

Edit its permissions

 

The JSON credentials file you just downloaded contains the email for the service account you created, copy it.

 

And paste it into the “Add members” field, select the permission to be “Storage Object Creator”. This service account doesn’t need permissions for anything else than dumping files in there. Not even viewing them.

 

Optional: if you want the files uploaded by the script to be publicly viewable, add the permission “Storage Object Viewer” to the user “allUsers”. Accounts are all referred to by email in Google land, but there exist special keywords such as “allUsers”.

Done with the Google setup 🙂

4- Running the script

If you haven’t already, download the script at the top of this page. Decompress it and edit the config at the top.

$credentials_file_path is the full path to the JSON credentials file you got from Google when you created the service account. It should be a secure location.

$destination_bucket_name is the name of the bucket you created

$access_token_cache_file_path is a location where Google’s OAuth tokens are cached, it too should be a secure location.

Run the script with only 1 argument being the file you want to upload. The script can also be included and used outside of CLI, in that case simply call the upload( $filename ) function.

The script returns the URL to the file in the bucket.

Voilà:

A small milestone

The millionth penstroke on Mandalagaba since the code rewrite last February

There’s more data I’d like to pull out of this. For example the average length of a stroke, average time, how many human lives where spent drawing mandalas, et cetera :).