Linux kernel patches “performance can be harmful” bug in video driver

Remember all those funkily named bugs of recent memory, such as Spectre, Meltdown, F**CKWIT, and RAMbleed?

Very loosely speaking, these types of bug – perhaps they’re better described as “performance costs” – are a side effect of the ever-increasing demand for ever-faster CPUs, especially now that the average computer or mobile phone has multiple processor chips, typically with multiple cores, or processing subunits, built into each chip.

Back in the olden days (by which I mean the era of chips like the Inmos Transputer), received wisdom said that the best way to do what is known in the jargon as “parallel computing”, where you split one big job into lots of smaller ones and work on them at the same time, was to have a large number of small and cheap processors that didn’t share any resources.

They each had their own memory chips, which means that they didn’t need to worry about hardware synchronization when trying to dip into each others’ memory or to peek into the state of each others’ processor because they couldn’t.

If job 1 wanted to hand over an intermediate result to job 2, some sort of dedicated communications channel was needed, and accidental interference by one CPU in the behavior of another was therefore sidestepped entirely.

Transputer chips each had four serial data lines that allowed them to be wired up into a chain, mesh or web, and jobs had to be coded to fit the interconnection topology available.

Share-nothing versus share-everything

This model was called share-nothing, and it was predicated on the idea that allowing multiple CPUs to share the same memory chips, especially if each CPU had its own local storage for cached copies of recently-used data, was such a complex problem in its own right that it would dominate the cost – and crush the performance – of share-everything parallel computing.

But share-everything computers turned out to be much easier to program than share-nothing systems, and although they generally gave you a smaller number of processors, your computing power was just as good, or better, overall.

So share-everything was the direction in which price/performance and thus market ultimately went.

After all, if you really wanted to, you could always stitch together several share-everything parallel computers using share-nothing techniques – by exchanging data over an inexpensive LAN, for example – and get the best of both worlds.

The hidden costs of sharing

However, as Spectre, Meltdown and friends keep reminding us, system hardware that allows separate programs on separate processor cores to share the same physical CPU and memory chips, yet without treading on each others’ toes…

…may leave behind ghostly remains or telltales of how other progams recently behaved.

These spectral remnants can sometimes be used to figure out what other programs were actually doing, perhaps even revealing some of the data values they were working with, including secret information such as passwords or decryption keys.

And that’s the sort of glitch behind CVE-2022-0330, a Linux kernel bug in the Intel i915 graphics card driver that was patched last week.

Intel graphics cards are extremely common, either alone or alongside more specialised, higher-performance “gamer-style” graphics cards, and many business computers running Linux will have the i915 driver loaded.

We can’t, and don’t really want to, think of a funky name for the CVE-2022-0330 vulnerability, so we’ll just refer to it as the drm/i915 bug, because that’s the search string recommended for finding the patch in the latest Linux kernel changelogs.

To be honest, this probably isn’t a bug that will cause many people a big concern, given that an attacker who wanted to exploit it would already need:

Local access to the system. Of course, in a scientific computing environment, or an IT department, that could include a large number of people.
Permission to load and run code on the GPU. Once again, in some environments, users might have graphics processing uniut (GPU) “coding powers” not because they are avid gamers, but in order to take advantages of the GPU’s huge performance for specialised programming – everything from image and video rendering, through cryptomining, to cryptographic research.

Simply put, the bug involves a processor component known as the TLB, short for Translation Lookaside Buffer.

TLBs have been built into processors for decades, and they are there to improve performance.

Once the processor has worked out which physical memory chip is currently assigned to hold the contents of the data that a user’s program enumerates as, say, “address #42”, the TLB lets the processor side-step the many repeated memory address calculations might otherwise be needed while a program was running in a loop, for example.

The reason regular programs refer to so-called virtual addresses, such as “42”, and aren’t allowed to stuff data directly into specific storage cells on specific chips is to prevent security disasters. Anyone who coded in the glory days of 1970s home computers with versions of BASIC that allowed you to sidestep any memory controls in the system will know how catastrophic an aptly named but ineptly supplied POKE command could be.)

The `drm/i915` bug

Apparently, if we have understood the drm/i915 bug correctly, it can be “tickled” in the following way:

User X says, “Do this calculation in the GPU, and use the shared memory buffer Y for the calculations.”
Processor builds up a list of TLB entries to help the GPU driver and the user access buffer Y quickly.
Kernel finishes the GPU calculations, and returns buffer Y to the system for someone else to use.
Kernel doesn’t flush the TLB data that gives user X a “fast track” to some or all parts of buffer Y.
User X says, “Run some more code on the GPU,” this time without specifying a buffer of its own.

At this point, even if the kernel maps User X’s second lot of GPU code onto a completely new, system-selected, chunk of memory, User X’s GPU code will still be accessing memory via the old TLB entries.

So some of User X’s memory accesses will inadvertently (or deliberately, if X is malevolent) read out data from a stale physical address that no longer belongs to User X.

That data could contain confidential data stored there by User Z, the new “owner” of buffer Y.

So, User X might be able to sneak a peek at fragments of someone else’s data in real-time, and perhaps even write to some of that data behind the other person’s back.

Exploitation considered complicated

Clearly, exploiting this bug for cyberattack purposes would be enormously complex.

But it is nevertheless a timely reminder that whenever security shortcuts are brought into play, such as having a TLB to sidestep the need to re-evaluate memory accesses and thus speed things up, security may be dangerously eroded.

The solution is simple: always invalidate, or flush, the TLB whenever a user finishes running a chunk of code on the GPU. (The previous code waited until someone else wanted to run new GPU code, but didn’t always check in time to suppress the possible access control bypass.)

This ensures that the GPU can’t be used as a “spy probe” to PEEK unlawfully at data that some other program has confidently POKEd into what it assumes is its own, exclusive memory area.

Ironically, it looks as though the patch was originally coded back in October 2021, but not added to the Linux source code because of concerns that it might reduce performance, whilst fixing what felt at the time like a “misfeature” rather than an outright bug.

What to do?

Upgrade to the latest kernel version. Supported versions with the patch are: 4.4.301, 4.9.299, 4.14.264, 4.19.227, 5.4.175, 5.10.95, 5.15.18 and 5.16.4.
If your Linux doesn’t have the latest kernel version, check with your distro maintainer to see if thids patch has been “backported” anyway.