The static experiment

Akrin is an server whose soul has been through many iterations of old hardware. It never needed much resources so I easily got away with $30 PCs bought at the university surplus.

It currently resides on an aged Pentium IV with just 500MB of RAM and some old IDE hard drive. With the addition of more & more projects (recently: CCTV installation, new sites such as www.blindspotis.com, database intensive Markov chains generation), it’s close to maximum capacity and could use an upgrade.

More than new hardware I’ve decided it was time to change how computing was done at home.  And I’m going for no moving parts. This means no fans, no spinning disks and no moving heads.

What are the advantages?

  • no vibrations, not an iota of noise
  • no jet take off sound when running heavier computation
  • no malfunctioning fans that could result in a fire hazard
  • supposedly hardware that is more resistant to shocks
  • fanless means less powerful which in terms means less power consumption

Here’s what I ordered:

It doesn’t come with RAM or a hard drive. I like the small form factor and the fact that it has 2 NICs. This means it can easily be recycled in a nice router should the experiment fail.

  • Some RAM (DDR2 SODIMM), I went for the max 2GB that the EPC-6542 will support. ($45) link
  • A 2.5″ SATA II 128GB solid state disk (SSD) ($223 – $75 mail in rebate = $148) link

Now SSDs are pretty expensive compared to traditional hard drives so it is a high price to pay for no moving parts. But they are also much faster, and because of the CCTV cams recording  24/7, I think that the I/O speed gain will have a tremendous overall effect on the server.

Akrin will soon run on $423 of new hardware, this is unprecedented 🙂

To be continued…

Markov chains based random word generation

Markov chains are used primarily in Natural Language Processing for part-of-speech tagging. Corpora are studied to establish the construction of sentences. This is a very powerful algorithm that can also be used to generate new material (words, text, et cetera). In this first post I will talk about generating words.

  • How it works

Given a corpus, letter patterns are studied at different depths. For depth one, the probability of a letter following another is established. For depth two the probability of a letter following a sequence of 2 letters is established. The same goes for greater depths. The result of all this studying is a table of probabilities defining the chances that letters follow given sequences of letters.

When the time comes to generate words, this table of probabilities is used. Say that we need to generate a word at depth 2, we seed the word with 2 null letters, then we look in the table for all the letters that can follow a sequence of 2 null letters and their associated probabilities. Their added probabilities will be 1 obviously. We generate a random number between 0 and 1 and use it to pick which following letter will be chosen. Let’s say that the letter “r” was chosen. Our generated word is now comprised of “null” and “r”. We now use this sequence as the basis for our next letter and look for the letters that can follow it. We keep going until an null letter is reached, signifying the end of the generated word.

Here’s a sample of a probability table:

  • Benefits of this algorithm

It will generate words that do not exist but respect the essence of the corpus it’s based on. This is really cool for example to generate words that sound English but aren’t (say for random passwords that can be pronounced/remembered). We could also make a list of all the cool words (motorcycle, sunglasses, racing, et cetera) and extract their essence to generate maybe a product name that is based on coolness :).

Go ahead and play with it: