Turning your web traffic into a Super Computer

Full disclaimer:

The subject matter of this post is controversial as it discusses extracting computing resources from the visitors of a website. There are a lot of discussions at the moment centered around web-browser based crypto currency mining. Most paint a deplorable picture of the practice; please keep in mind that there are very desirable paths alongside which these practices can develop. I am not elaborating on these arguments here, I am only describing a method to harness the resources.

Premise

Web browsers are becoming quite powerful for code execution. Between Javascript’s increase in capability, WebAssembly, access to GPU & threading, a web browser today is almost as desirable for computing as the machine it’s running on. Ever since the rise of web-based crypto currency miners, I’ve been thinking of harnessing all that computing power as a single entity: a super computer made of your visitor’s web browsers.

Just like a regular computer cluster, the nodes all participate in a coordinated fashion to solving a single problem. Unlike a regular computer cluster, the nodes are very ephemeral (as website visitors come and go) and can’t talk to each other (no cross site requests).

Here’s a demo of what I came up with:

Right: the super computer control server
Left: one of the web clients contributing to the super computer simply by being connected to a website (& CPU metrics)

The problem being solved here is the hashing of 380,204,032 string permutations to find the reverse of a given hash. Problem parameters were chosen to make heavy processing quick for the clients.

Implementation & code samples

At the core of the idea is the websocket technology. It creates a persistent connection between a server and all of the nodes (the visitors of your website). This connection can be used to orchestrate actions between the nodes so that they can act as a concerted entity. From delivering the code to passing messages for coordination, websockets are what make everything possible.

Having a websocket connection to clients dramatically changes what you can do with web clients. They are fully addressable for the duration of their visit. They may show up on a website and be served some pre-established javascript; but with websockets, any javascript can materialize at any time.

Right: the super computer control server
Left: a web client being given an instruction on the fly

 

Slightly tangential but still worth considering, using a web view app, Javascript can pass execution to the app itself. This means code showing up on the websocket can escape the webview bubble and go into app land.

Right: the super computer control server
Left: a web app being given an instruction which percolates to the app layer

 

Now this is nothing new in a lot of ways; apps can be made to get instructions from C&Cs, and websites can get Javascript after the initial page load from dynamic sources. The websocket technique though is as dynamic as it gets (no Ajax pull), it is portable to many browsers and many devices, it is hard to catch looking at a web inspector; lastly, it executes with full access to the context it materialized in.

So we’ve established that websockets can be used to dynamically deliver code to be ran by the nodes. It can also be used for message passing and the overall orchestration of distributing the problem to be solved.

Crackzor.js

6 years ago I wrote a ditributed OpenMPI based password cracker: crackzor. Password cracking is a good distributed problem to solve because it’s a fairly simple problem: run through all the character permutations. The fact that it exhausts a known space also means benchmarking is easy. So to put the idea of a transient node javascript super computer in practice, I rewrote crackzor in JS instead of C, and for websockets instead of OpenMPI.

Every distributed problem is different and crackzor itself isn’t a magic way to distribute any problem to be solved. The magic of crackzor is its ability, given a space of character permutations, to divide it up in chunks which can be processed by the nodes. Given the problem, a start iteration and end iteration, a node can get to work without having to be provided the permutations themselves, thus removing the bandwidth bottleneck.

The first challenge: maximizing usage of the node’s CPU.

Javascript runs single threaded by default, so when the websocket sends code to be ran by a client, by default, the code running as fast as it can will only be able to fill one core of the CPU. A large majority of machines today have many more cores available. So we have to figure out how to use them or our super computer is going to loose a large portion of its processing power right off the bat.

Web workers to the rescue. With HTML5, it’s easy as pie to thread code. The one trick with the code we want to thread is that it can’t be gotten from a file as the web worker documentation suggests. That’s because our code doesn’t come from a static javascript file remember? It shows up out the the blue on the websocket, so it came from the network and is now in memory somewhere => not a file we can refer to.

The solution is to wrap it in a blob as such

var worker_code = 'alert( "this code is threaded on the nodes" );'

window.URL = window.URL || window.webkitURL;

var blob;
try {
    blob = new Blob([worker_code], {type: 'application/javascript'});
} catch (e) {
    window.BlobBuilder = window.BlobBuilder || window.WebKitBlobBuilder || window.MozBlobBuilder;
    blob = new BlobBuilder();
    blob.append(worker_code);
    blob = blob.getBlob();
}
workers.push( new Worker(URL.createObjectURL(blob)) ) ;

Here you’ll notice we have our first layer of encapsulation. The code relevant to the problem we are solving is in the variable worker_code, the rest of the javascript only threads it.

Having distributed amongst a node’s cores, we now look at

the second challenge: distributing between the nodes

This work is obviously up to the websocket server along with subsequent coordination. Without going into too much details, the websocket server keeps track of all the nodes as they come and go, it also keeps track of which ones are working or not, allocates new chunks of the problem to nodes as they become available.

A trick of the websocket server is that it is running at all times to handle node connections. Super computer problems however may change from one day to the next. To address that, I give it a function which reads a file and evals its code; the function is summoned by a process signal. As such:

function eval_code_from_file() {
    if( !file_exists("/tmp/code") ) {
        console.log( "error: file /tmp/code does not exist" ) ;
    } else {
        var code = read_file( "/tmp/code" ) ;
        code = code.toString() ;
        eval( code ) ;
    }
}

process.on('SIGUSR1', eval_code_from_file.bind() );

With this puppy in place, the next time I “kill -USR1 websocket_server_PID”, it will be imbued with new code that did not exist when it started. Does this sound familiar? Yup, javascript is super interesting in the ability it gives you to run arbitrary code at any time with full access to the established context.

Thus arrive the 2nd and 3rd layers of encapsulation, the code which will be distributed to the nodes is in a file which is to be evaled on the websocket server side and sent over the websocket to the clients.

The actual distribution to the nodes is simple, have them connect with a callback to eval code. Something like that:

Client:

var websocket_client=io.connect("http://websocket_server.domain.com") ; 
websocket_client.on( "eval_callback",function(data){data=atob(data),eval(data)}.bind() ) ;

Server:

client_socket.emit( "eval_callback", new Buffer("alert('this code will run on the client');").toString("base64") ) ;

Recapping where we are

So…

  1. all the transient nodes (web browser of website visitors) attach to a websocket server
  2. the websocket server receives SIGUSR1 which signals it to execute new code it gets from a file
  3. this new code gives the websocket server a packaged problem to be solved by the nodes
  4. this new code also instructs how the websocket server will distribute and coordinate the nodes
  5. once the packaged problem to be solved shows up on a node, it is evaled and it contains threading to maximize CPU usage.

And there you have it,

all the pieces you need to make a super computer from your web traffic. I’m choosing not to publish the full code of my implementation for reasons of readability, security and complexity but I can go into more details if asked.

The same way that peer-to-peer protocols made any data available anywhere any time, could this do the same for computing power? Mind=blown, and your CPU along with it.

More tips

  • When choosing a chunk size for clients to work on, it’s important to not pick too big a size. The nodes are very transient and a big chunk size means the chunk’s processing is more likely to be interrupted. Most web browsers also offer to kill poorly coded javascript running berserk and so a small chunk size taking a few seconds and letting the machine catch it’s breath briefly will make it less likely that a browser will notify a user that a script needs to be killed.
  • When encapsulating out the wazoo, keep in mind that Internet Explorer (Edge or whatever it’s called today) doesn’t support backticks.
  • Syntax highlighting will be confused by the strings in strings in strings of encapsulation, it helps to just turn it off.
  • Javascript md5 implementation here: https://gist.github.com/josedaniel/951664
  • I found it necessary to keep track of an average time to solving a chunk so that I may exclude the nodes which are taking too long and polluting the good performance of the supercomputer.

16 Replies to “Turning your web traffic into a Super Computer”

  1. Ce qui m’intéresse n’est pas un super serveur capable de gérer tous les calculs de «node», mais un système informatique qui n’a pas d’entités mais qui ne s’appuie que sur ces «node». C’est-à-dire, il est basé sur l’informatique distribuée non centrale, et il n’y a pas de super planificateur (le server qui contrôle les node).

    1. Salut, merci pour le commentaire 🙂 Pour l’instant je ne compte pas travailler plus loin que cette preuve de concept mais je veux bien partager du code en plus que ce qui est présent dans mon article si cela t’interesse. Comme je le dis vers la fin, c’est difficile a partager tout le code dans un package car il y a pas mal de complexité. Certaines parties sont du coté du client, d’autre du coté du serveur, d’autres encore sont volatiles. Il serait très difficile de comprendre comment tout mettre en place si je te donnais ce que j’ai maintenant. Par contre, avec l’article comme plan, je suis content de partager les pièces du puzzle alors qu’elles te deviennent necessaires. Elles seront alors beaucoup plus faciles a implementer dans leur contexte.

      I hope this helps.

  2. today , I very Surprise that you have the same idea.

    in 2015 , i have this idea,so write for can calc the people`s distance on Earth。

    in 2016, public the code on git , and upload the video, later for some reason, i delet all.

    so now i Improve the Calculation method , and try to solve the Privacy and verification。

    my email : 2662908789@qq.com

    1. Welcome to the internet where there is no such thing as an original idea. Implementation & presentation are far more relevant in bringing ideas to the forefront. Verification is definitely a challenge of turning this into a viable technology as many other commenters pointed out, I’ll be curious to see what you come up with.

  3. You cannot really trust the browsers, because people might mess with the code in their browser to send you back bogus results. Someone who is angry that you’re using their CPU might want to do this to get revenge, or just for lulz.

    To get some level of trust you might have to send the task to multiple browsers and take the most common answer. But if the attacker has a lot of browsers, you might end up sending the request multiple times to the same attacker.

    Cryptocurrency miners don’t really have this problem because it’s a task that takes a fairly long time to compute (find a block with a hash that complies with the difficulty) but only a tiny time to verify (verify that the block has a hash that complies with the difficulty). This is the core idea behind bitcoin mining: proof of work that takes time to do but can be verified easily.

    While password cracking in a way seems very similar to cryptocurrency mining, there is an important difference. There is only 1 possible correct password (ignoring hash collisions), but there are infinite equally-probable blocks that comply with the cryptocurrency difficulty. So with password cracking the domain needs to be split between the browsers, and if one browser misbehaves, its range won’t have coverage and if the password was in that range, you’ll never find it. But with cryptocurrency mining, no organized splitting is necessary, just randomly assign browsers ranges from a huge domain, and if one browser misbehaves, it’s no worse than if that browser never joined.

    1. you are right, recently I was thinking about something like ddos mining, where you buy tickets for the prize – funded by the attacker – giving proof_of_handshake with X website in time range Y, to the pool.

      Then you have to trust the pool for the payout, even if at least you can create some proof of past honesty.

      I find quite incredible nobody still does this, it’s like security through obscurity but on the non-scalability side, of current ddos, surely to speed the internet development an anti-protocol is needed imho.

  4. That’s a great way to take care of the traffic and do something better. I always had a similar concept in mind, to have some sort of tracker and all the processing to be done by the client side. Sadly, not everything will be available and security will be a big issue, but it’s a fun hacking experiment.

    Thanks for sharing this, I hope you’ll come up with new experiments in the future.

  5. now turn it into a network that sells compute time and allow content creators to make money with computing power instead of ads — with visitor’s consent. what Salon is doing, but at auction, anywhere, and for anything, not just crypto.

    1. I thought about it, bundling the cycles for resale which finances the websites which participate in the pool. No ads, cheap computing. It’s perfect 🙂 The only obstacle really is that it’s a gigantic project, not just a little proof of concept like I just did. And I really don’t have time for giant projects…

  6. Awesome. Seems like getting a Linux kernel and a LAMP stack running on this would be a super awesome way that could lead to decentralizing server farms. The technical challenge of doing it sounds fun and interesting. The real payoff may be when a billing and payment system is able to consolidate billing from processing clients and distribute the payments to the owners of the processing clients based on the work performed

    1. There’s definitely a company to spin off from the idea :). Computing power is becoming a more and more relevant resource nowadays.

Leave a Reply

Your email address will not be published. Required fields are marked *