Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Linus Torvalds
Date: Thu Oct 12 2023 - 13:47:36 EST


On Thu, 12 Oct 2023 at 10:10, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> The fix seems to be a simple one-liner, ie just
>
> - asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]") \
> + asm(__pcpu_op2_##size(op, __percpu_arg(a[var]), "%[val]") \

Nope. That doesn't work at all.

It turns out that we're not the only ones that didn't know about the
'a' modifier.

clang has also never heard of it in this context, and the above
one-liner results in an endless sea of errors, with

error: invalid operand in inline asm: 'movq %gs:${1:a}, $0'

Looking around, I think it's X86AsmPrinter::PrintAsmOperand() that is
supposed to handle these things, and while it does have some handling
for 'a', the comment around it says

case 'a': // This is an address. Currently only 'i' and 'r' are expected.

and I think our use ends up just confusing the heck out of clang. Of
course, clang also does this:

case 'P': // This is the operand of a call, treat specially.
PrintPCRelImm(MI, OpNo, O);
return false;

so clang *already* generates those 'current' accesses as PCrelative, and I see

movq %gs:pcpu_hot(%rip), %r13

in the generated code.

End result: clang actually generates what we want just using 'P', and
the whole "P vs a" is only a gcc thing.

Why *does* gcc do that silly thing of dropping '(%rip)' from the address, btw?

Linus