Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Linus Torvalds
Date: Tue Oct 10 2023 - 14:52:49 EST


On Tue, 10 Oct 2023 at 11:41, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> Yes, but does it CSE the load from multiple addresses?

Yes, it should do that just right, because the *asm* itself is
identical, just the offsets (that gcc then adds separately) would be
different.

This is not unlike how we depend on gcc CSE'ing the "current" part
when doing multiple accesses of different members off that:

static __always_inline struct task_struct *get_current(void)
{
return this_cpu_read_stable(pcpu_hot.current_task);
}

with this_cpu_read_stable() being an inline asm that lacks the memory
component (the same way the fallback hides it by just using
"%%gs:this_cpu_off" directly inside the asm, instead of exposing it as
a memory access to gcc).

Of course, I think that with the "__seg_gs" patches, we *could* expose
the "%%gs:this_cpu_off" part to gcc, since gcc hopefully then can do
the alias analysis on that side and see that it can CSE the thing
anyway.

That might be a better choice than __FORCE_ORDER, in fact.

IOW, something like

static __always_inline unsigned long new_cpu_offset(void)
{
unsigned long res;
asm(ALTERNATIVE(
"movq " __percpu_arg(1) ",%0",
"rdgsbase %0",
X86_FEATURE_FSGSBASE)
: "=r" (res)
: "m" (this_cpu_off));
return res;
}

would presumably work together with your __seg_gs stuff.

UNTESTED!!

Linus