Markov chains based random text generation

We’ve already seen how to use Markov chains to generate random words that are based on the essence of a previously analyzed corpus. Well the exact same algorithm can be applied to text. The base entities become words instead of letters. I make punctuation be part of the entities, this way, sentence flow becomes part of the extracted statistical essence.

Feel free to send me ideas of cool corpora to analyze.

You can play with it here:

5 Replies to “Markov chains based random text generation”

  1. Here’s some ideas on what you could compile, be it mashups or standalones :

    – Obama’s speeches + Fight Club
    – The Star Wars scripts
    – Stephen Hawking’s The Universe in a Nutshell (or any other [kick]astrophysics book).
    – James Joyce’s Finnegans Wake. I would actually be very interested to see this one. What would it bring up? Maybe the compiled text would make actual sense lol !
    – Poetry. Pablo Neruda’s Canto General for instance.
    – All of Lovecraft’s works.

    1. Well the problem is that I’m kind of limited to what’s in the public domain for 2 reasons:
      – I’m cheap
      – easy to find electronic copies in plain text on sites like

      So the only thing I found in there was a couple of texts from Lovecraft, I’ve added them to the corpora and am redoing the analysis, you’ll see a new option shortly.

      thanks for the suggestions 🙂

    1. Ouais c’est pas nouveau, la partie interessante c’est d’appliquer l’algorithme a bien plus. Je commence par les mots, ensuite le text mais la musique arrive bientot. C’est un truc sur lequel j’avais travaille il y a quelques annees et les resultats sont tres, tres cools. J’ai quelques autres idees de trucs auquels l’appliquer aussi (images) mais c’est un peu plus experimental.

Leave a Reply

Your email address will not be published. Required fields are marked *