Focus & Blur: Behavioral Inference & the Tattletale Browser

This web thing’s been bugging me for too long. Have you ever tried to background a tab that is playing insufferable & unskippable content, only to find out that the annoyance has paused itself until your eyeballs are known be aimed back at it? Why do browsers honor requests to let websites know if you’re paying attention or not?

This is achieved by relying on the focus and blur events. But there are many UI Elements that rely on them to trigger useful UI responses. Think of a suggestion box that shows up when you click in a search bar for example. The window element though, is one for which I cannot think of a single instance where the focus and blur events at are used to benefit the user. I think a well intended couple of events were generally implemented to every possible elements, but one of them reveals more than was intended and is abused to that effect. Why would ad-blocker not nuke them either? I’ve gone through this rabbit hole several times over the years trying to find an extension or adblocker customization to dismiss these events. Alas, they never seem to have made it into the crosshair as the true annoyance that they are. How do you like to have your browser report how good you are at consuming content as intended?

These events are responsible for more ills than making sure you’re watching, they are a key metric for inferring behavior. As with much of data mining, what’s scary isn’t really the information you’re giving away, it’s what can be inferred from it. In a way these attention events are perfectly suited for the attention age. Particularly though, they matter when they are attached to the window element. As far as I know, that is the only method I’ve seen in the wild that is abused into this purpose.

In any case, since I never could find anything, here’s what I came up with. The best way I found to run user JS on all websites is using Tampermonkey. Then here’s the script I’m running:

// ==UserScript==
// @name         Attention Event Nuker
// @namespace    http://tampermonkey.net/
// @version      2024-05-01
// @description  nukes focus and blur events when attached to the window element
// ==/UserScript==

(function() {
    var old_add_event_listener = EventTarget.prototype.addEventListener ;

    EventTarget.prototype.addEventListener = function(event_name, event_handler) {
        if( this.toString()==window.toString() &&
           (event_name=="blur" || event_name=="focus") ) {
            console.log( "attention event caught: " + event_name + " on: " + window.location.host ) ;
        } else {
            old_add_event_listener.call( this, event_name, event_handler ) ;
        }
    };
})();

Unfortunately I did run into a couple of sites that somehow rely on the events to even work properly. I don’t think I want to reverse engineer them 1 by 1 so I’m adopting a blacklist of sites which is a bit obnoxious. For a while I did have the script report which sites were asking for the events, the results weren’t surprising and showed that pretty much any big site with a baseline of behavioral data mining wants to know what your eyeballs are in front of.

Quiet Airtags

I didn’t post several years ago about the GPSes I installed on our farm vehicles. It felt like painting a target on my back. It took quite a bit of figuring out to set up Particle.io‘s early asset trackers. They’ve since created a dedicated preprogrammed and well polished device, seeing an opportunity in the success of the early hobbyist version I suppose. I never posted my setup, code, or experience but let’s just say it worked well for a few years, for very cheap. Unfortunately, the 2G network they relied on was eventually retired, and that forced me reconsider options.

And well, an obvious contender these days are Airtags. I bought a few for testing, and they quickly became the obvious choice. I replaced bulky cellular GPSes with them and folded them into home monitoring. Watching for geofences, battery status, and last contact.

While I can’t wire them directly to the vehicle’s battery, their battery does seem to last a good year (Vermont winters wear them down faster). And they come with several huge advantages over GPSes.

  • A mesh network of people’s iPhones has a lot better coverage than cellular in a rural area. Cell phones will report them when they finally get to a tower or some wifi.
  • They aren’t subject to tree or cloud cover.
  • They are tiny! I went through great lengths to paint and find a place for bulky GPS boxes. Airtags on the other hand will live anywhere.
  • They are cheap, and have no recurring cost (except the cell battery once a year).

These advantages led me to significantly lower the bar to what I stick them on. It’s no longer reserved for the expensive vehicles. If it costs money and isn’t fastened to the ground, it gets an Airtag.

Of course when used as theft tracking, their chirping is problematic. And so I finally bit the bullet and gave them the surgery they need to make them quiet. And it was very very trivial, I should have done this much earlier.

Open them up, I used a stronger blade than the exacto for prying. Note the 3 sharpie dots to point tabs.

I simply snipped the 2 wires going to the speaker

Still works!

Analyzing Ultrasonic Signals

Disclaimer: I and other people may or may not have had anything to do with figuring this out.

Zoom Rooms emit ultrasounds to let devices within “earshot” connect without the user having to type anything. Ultrasounds have been used for such proximity related convenience before, and sometimes for more nefarious purposes such are mapping out who’s next to each other in the world. All using a wireless network that is available anywhere and completely ad-hoc. It just doesn’t go very far (thankfully).

In this post we’ll see how to analyze such a signal using Zoom Rooms’ Share Screen signal as an example.

Harvesting the Sound

First things first, to decipher the signal contained we need to extract the bits from a fuller audio landscape containing “noise” throughout various frequencies. Ultrasonic frequencies, the ones the human ear can’t hear, are by definition above 20KHz. Now this fluctuates between individuals, and especially age groups 🙂 but that’s the general cut-off: anything above 20KHz is unhearable, thus usable to transmit hidden signals. Although microphone and speaker manufacturers have no incentive to build products in the non-human range, humans are the one giving them money after all. And so hidden signals tend to be right around the 20KHz cutoff, where human-centered manufacturing specs will have a good chance of still working.

Audacity is great for recording and analysis, this is what we’ll use here. First record your sample, using your laptop, get close to the source of the ultra sounds, try and keep things quiet during recording, and gather a good sample.

Extracting the Signal

Looking at various time zoom levels, we can home in on a repeating pattern. The room is complete silence but your computer does hear it loud and clear.

The complete silence as it was recorded. Note the time scale just above the recording.

Zooming in on the pattern, each of the pattern’s blob is made of several blobs. At this level we can guess that the signal is made of 10 notes. Zoom Room’s Share Screen codes are 6 digits long so it feels right, either for control characters, or because they planned on room for expansion. Now we don’t know really know when the signal ends and when it stops, I’m drawing the rectangle for 10 notes starting on the quiet one, because I could see a silence being used as a separator much like a space or a line return.

Finally, we zoom in just enough to start poking at each individual note/character.

Pretty easy so far, all we’ve done is record and zoom and we can start seeing our signal. Now begins the tedious task of annotating as many samples of this signal as you can. This is the data that will let you decipher the code.

Select a clean section of just one note, don’t grab the edges, just the meat.

Then Click on Analyze -> Plot Spectrum

This will do a Fourier Transform of the selected area to decompose the sound into all of it various frequencies

Place the cursor on the highest ultrasonic peak, and read the frequency: 19201Hz or 19.2KHz here.

Make note of it by adding a label at the selection. You’ll first need a label track if you don’t already have one.

Then you can add the label.

Do this for each note/character in the signal, it’s worth confirming the 10 character repetition we think we’re seeing. Then do this for many more samples… The more data, the easier it’ll be to decipher. Your project, which you should save often, will look something like that:

Deciphering the Signal

Now, obviously depending on what signal you are studying, the encoding will differ. I’m only talking about Zoom Rooms here and so I’ll only give general advice followed by the Zoom algorithm.

The general advice is as follows:

1. Gather a lot of data, this is the non-exciting part so it’s easy to want to move past it.

2. More often than not, there will be control notes/characters indicating the beginning or the end, or both of the signal. In the screenshot above, 19.1Khz followed by 19.2Khz is looking very likely like a control, align your audio sample to it and focus on the remaining notes/characters.

3. Look at the data from different angles, visually write it differently to see if patterns emerge. Spreadsheets can help.

4. Try various scenarios, even if you know they are likely false, they might get you closer to the truth.

5. Occam’s razor (or the lazy programmer) is likely a good guess

Zoom Room Rosetta Stone

Each signal starts with 19.1Khz followed by 19.2KHz. Then the 6 digits of the code displayed on the screen is “played”. Then a 2 digits representing the checksum of the 6 digits which is their sum. That’s 10 characters total.

Each digit maps to 2 possible frequencies:

0 -> 19.2Khz / 19.3Khz
1 -> 19.3Khz / 19.4Khz
2 -> 19.4Khz / 19.5Khz
3 -> 19.5Khz / 19.6Khz
4 -> 19.6Khz / 19.7Khz
5 -> 19.7Khz / 19.8Khz
6 -> 19.8Khz / 19.9Khz
7 -> 19.9Khz / 20.0Khz
8 -> 20.0Khz / 20.1Khz
9 -> 21.1Khz / 20.2Khz

Weird how they can possibly overlap and this is the twist behind this encoding, all other things being rather straightforward, for your next digit you always pick the frequency furthest from the frequency you just played. If you just played your control signal: 19.1Khz, 19.2Khz and your code starts with a 3 you will pick 19.6KHz to play the 3 as it it furthest from 19.2Khz. If your next digit is a 2, you will pick 19.4Khz to play it as it is furthest from 19.6Khz. I don’t know enough about sound engineering to know if Zoom did this to disambiguate frequencies which are close to each other, or if it’s meant as a cipher. I’m guessing the former, it seems to be a smart way to guarantee at least 0.2Khz of difference between 2 proximate characters while only adding 0.2Khz of spectrum. Since we know devices are likely to become distorted at the beginning of the inaudible range, it makes sense to both make an extra effort to distinguish characters, while not expanding too far into that range. Pretty cool eh?

Here’s a real world example, say that the code played is 790155:

first you play the control: 19.1Khz, 19.2Khz

then 7 with 20.0Khz as it is the furthest from 19.2Khz
then 9 with 20.2Khz as it is the furthest from 20.0Khz
then 0 with 19.2Khz as it is the furthest from 20.2Khz
then 1 with 19.4Khz as it is the furthest from 19.2Khz
then 5 with 19.8Khz as it is the furthest from 19.4Khz
then 5 with 19.7Khz as it is the furthest from 19.8Khz

compute your checksum of 7+9+0+1+5+5 = 27

play 2 with 19.4Khz as it is the furthest from 19.7Khz
then 7 with 20.0Khz as it is the furthest from 19.4Khz

Voila!

Some Code to go along with it

If you want to play the Share Screen code from your Zoom Rooms into the world, the following code will do it for your for a few seconds. Just make sure to update the “code” variable near the top. This code works in your standard browser’s web inspector console.

(async function main () {

	var code = "<6_digit_code_goes_here>" ;

	var context = new AudioContext() ;
	var o = context.createOscillator() ;
	o.type = "sine" ;
	var g = context.createGain() ;
	o.connect( g ) ;
	o.frequency.value = 0 ;
	g.connect( context.destination ) ;
	o.start( 0 ) ;

	var sleep_time_ms = 50 ;

	var control_frequency = 19100
	var frequencies = {
		0:[19200,19300],
		1:[19300,19400],
		2:[19400,19500],
		3:[19500,19600],
		4:[19600,19700],
		5:[19700,19800],
		6:[19800,19900],
		7:[19900,20000],
		8:[20000,20100],
		9:[20100,20200],
	}

	console.log( "starting ultrasound emission" ) ;
	var i = 50 ;
	while( i>0 ) {
		i-- ;

		// control
		o.frequency.value = 19100
		await new Promise( r => setTimeout(r, sleep_time_ms) ) ;
		o.frequency.value = 19200
		await new Promise( r => setTimeout(r, sleep_time_ms) ) ;


		// payload
		var checksum = 0 ;
		last_frequency = o.frequency.value ;
		for( var j=0 ; j<code.length ; j++ ) {
			checksum += parseInt( code[j] ) ;
			o.frequency.value = pick_furthest_frequency( last_frequency, frequencies[parseInt(code[j])] ) ;
			last_frequency = o.frequency.value ;
			await new Promise( r => setTimeout(r, sleep_time_ms) ) ;
		}

		// checksum
		checksum = checksum.toString() ;
		if( checksum.length==1 ) {
			checksum = "0" + checksum ;
		}

		o.frequency.value = pick_furthest_frequency( last_frequency, frequencies[parseInt(checksum[0])] ) ;
		last_frequency = o.frequency.value ;
		await new Promise( r => setTimeout(r, sleep_time_ms) ) ;
		o.frequency.value = pick_furthest_frequency( last_frequency, frequencies[parseInt(checksum[1])] ) ;
		last_frequency = o.frequency.value ;
		await new Promise( r => setTimeout(r, sleep_time_ms) ) ;
	}

	console.log( "stopped ultrasound emission" ) ;
	g.gain.exponentialRampToValueAtTime( 0.00001, context.currentTime + 0.04 ) ;

	function pick_furthest_frequency( previous, possible_new_frequencies ) {
		if( Math.abs(last_frequency-possible_new_frequencies[0]) > Math.abs(last_frequency-possible_new_frequencies[1]) ) {
			return possible_new_frequencies[0] ;
		}
		return possible_new_frequencies[1] ;
	}

})();