Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()
From: Linus Torvalds
Date: Wed Oct 18 2023 - 16:34:48 EST
On Wed, 18 Oct 2023 at 13:22, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> And yes, sometimes we use actual volatile accesses for them
> (READ_ONCE() and WRITE_ONCE()) but those are *horrendous* in general,
> and are much too strict. Not only does gcc generally lose its mind
> when it sees volatile (ie it stops doing various sane combinations
> that would actually be perfectly valid), but it obviously also stops
> doing CSE on the loads (as it has to).
Note, in case you wonder what I mean by "lose its mind", try this
(extremely stupid) test program:
void a(volatile int *i) { ++*i; }
void b(int *i) { ++*i; }
and note that the non-volatile version does
addl $1, (%rdi)
but the volatile version then refuses to combine the read+write into a
rmw instruction, and generates
movl (%rdi), %eax
addl $1, %eax
movl %eax, (%rdi)
instead.
Sure, it's correct, but it's an example of how 'volatile' ends up
disabling a lot of other optimizations than just the "don't remove the
access".
Doing the volatile as one rmw instruction would still have been very
obviously valid - it's still doing a read and a write. You don't need
two instructions for that.
I'm not complaining, and I understand *why* it happens - compiler
writers very understandably go "oh, I'm not touching that".
I'm just trying to point out that volatile really screws up code
generation even aside from the "access _exactly_ once" issue.
So using inline asm and relying on gcc doing (minimal) CSE will then
generate better code than volatile ever could, even when we just use a
simple 'mov" instruction. At least you get that basic combining
effect, even if it's not great.
And for memory ops, *not* using volatile is dangerous when they aren't stable.
Linus