Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Linus Torvalds
Date: Wed Oct 18 2023 - 16:34:48 EST


On Wed, 18 Oct 2023 at 13:22, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> And yes, sometimes we use actual volatile accesses for them
> (READ_ONCE() and WRITE_ONCE()) but those are *horrendous* in general,
> and are much too strict. Not only does gcc generally lose its mind
> when it sees volatile (ie it stops doing various sane combinations
> that would actually be perfectly valid), but it obviously also stops
> doing CSE on the loads (as it has to).

Note, in case you wonder what I mean by "lose its mind", try this
(extremely stupid) test program:

void a(volatile int *i) { ++*i; }
void b(int *i) { ++*i; }

and note that the non-volatile version does

addl $1, (%rdi)

but the volatile version then refuses to combine the read+write into a
rmw instruction, and generates

movl (%rdi), %eax
addl $1, %eax
movl %eax, (%rdi)

instead.

Sure, it's correct, but it's an example of how 'volatile' ends up
disabling a lot of other optimizations than just the "don't remove the
access".

Doing the volatile as one rmw instruction would still have been very
obviously valid - it's still doing a read and a write. You don't need
two instructions for that.

I'm not complaining, and I understand *why* it happens - compiler
writers very understandably go "oh, I'm not touching that".

I'm just trying to point out that volatile really screws up code
generation even aside from the "access _exactly_ once" issue.

So using inline asm and relying on gcc doing (minimal) CSE will then
generate better code than volatile ever could, even when we just use a
simple 'mov" instruction. At least you get that basic combining
effect, even if it's not great.

And for memory ops, *not* using volatile is dangerous when they aren't stable.

Linus