Re: [PATCH tip] x86/percpu: Rewrite arch_raw_cpu_ptr()

From: Sean Christopherson
Date: Fri Oct 13 2023 - 17:02:33 EST


On Fri, Oct 13, 2023, Uros Bizjak wrote:
> On Fri, Oct 13, 2023 at 6:04 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Wed, Oct 11, 2023, Uros Bizjak wrote:
> > > Additionaly, the patch introduces 'rdgsbase' alternative for CPUs with
> > > X86_FEATURE_FSGSBASE. The rdgsbase instruction *probably* will end up
> > > only decoding in the first decoder etc. But we're talking single-cycle
> > > kind of effects, and the rdgsbase case should be much better from
> > > a cache perspective and might use fewer memory pipeline resources to
> > > offset the fact that it uses an unusual front end decoder resource...
> >
> > The switch to RDGSBASE should be a separate patch, and should come with actual
> > performance numbers.
>
> This *is* the patch to switch to RDGSBASE. The propagation of
> arguments is a nice side-effect of the patch. due to the explicit
> addition of the offset addend to the %gs base. This patch is
> alternative implementation of [1]
>
> [1] x86/percpu: Use C for arch_raw_cpu_ptr(),
> https://lore.kernel.org/lkml/20231010164234.140750-1-ubizjak@xxxxxxxxx/

Me confused, can't you first switch to MOV with tcp_ptr__ += (unsigned long)(ptr),
and then introduce the RDGSBASE alternative?

> Unfortunately, I have no idea on how to measure the impact of such a
> low-level feature, so I'll at least need some guidance. The "gut
> feeling" says that special instruction, intended to support the
> feature, is always better than emulating said feature with a memory
> access.

AIUI, {RD,WR}{FS,GS}BASE were added as faster alternatives to {RD,WR}MSR, not to
accelerate actual accesses to per-CPU data, TLS, etc. E.g. loading a 64-bit base
via a MOV to FS/GS is impossible. And presumably saving a userspace controlled
by actually accessing FS/GS is dangerous for one reason or another.

The instructions are guarded by a CR4 bit, the ucode cost just to check CR4.FSGSBASE
is probably non-trivial.