We’ve already seen how to use Markov chains to generate random words that are based on the essence of a previously analyzed corpus. Well the exact same algorithm can be applied to text. The base entities become words instead of letters. I make punctuation be part of the entities, this way, sentence flow becomes part of the extracted statistical essence.
Feel free to send me ideas of cool corpora to analyze.
You can play with it here:
Here’s some ideas on what you could compile, be it mashups or standalones :
– Obama’s speeches + Fight Club
– The Star Wars scripts
– Stephen Hawking’s The Universe in a Nutshell (or any other [kick]astrophysics book).
– James Joyce’s Finnegans Wake. I would actually be very interested to see this one. What would it bring up? Maybe the compiled text would make actual sense lol !
– Poetry. Pablo Neruda’s Canto General for instance.
– All of Lovecraft’s works.
Well the problem is that I’m kind of limited to what’s in the public domain for 2 reasons:
– I’m cheap
– easy to find electronic copies in plain text on sites like http://www.gutenberg.org/
So the only thing I found in there was a couple of texts from Lovecraft, I’ve added them to the corpora and am redoing the analysis, you’ll see a new option shortly.
thanks for the suggestions 🙂
Quelqu’un s’est amusé à compiler divers bouquins avec Markov : Hamlet et Alice in Wonderland, The Book of Yellow et The Egyptian Book of the Dead, et – mon préféré – Alice in Wonderland et The Book of Revelation of St John.
L’Apocalypse selon Alice. Ça fonctionne super bien.
Ouais c’est pas nouveau, la partie interessante c’est d’appliquer l’algorithme a bien plus. Je commence par les mots, ensuite le text mais la musique arrive bientot. C’est un truc sur lequel j’avais travaille il y a quelques annees et les resultats sont tres, tres cools. J’ai quelques autres idees de trucs auquels l’appliquer aussi (images) mais c’est un peu plus experimental.
Ah ouais ! Avec la zik ça peut déchirer !