Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Uros Bizjak
Date: Wed Oct 11 2023 - 16:01:00 EST


On Wed, Oct 11, 2023 at 9:52 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 11 Oct 2023 at 11:42, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
> >
> > The attached patch was tested on a target with fsgsbase CPUID and
> > without it. It works!
>
> .. I should clearly read all my emails before answering some of them.
>
> Yes, that patch looks good to me, and I'm happy to hear that you
> actually tested it unlike my "maybe something like this".
>
> > The patch improves amd_pmu_enable_virt() in the same way as reported
> > in the original patch submission and also reduces the number of percpu
> > offset reads (either from this_cpu_off or with rdgsbase) from 1663 to
> > 1571.
>
> Dio y ou have any actka performance numbers? The patch looks good to
> me, and I *think* rdgsbase ends up being faster in practice due to
> avoiding a memory access, but that's very much a gut feel.

Unfortunately, I don't have any perf numbers, only those from Agner's
instruction tables. The memory access performance has so many
parameters, that gut feeling is the only thing besides real
case-by-case measurements. The rule of thumb in the compiler world is
also that memory access should be avoided.

Uros.

>
> > The only drawback is a larger binary size:
> >
> > text data bss dec hex filename
> > 25546594 4387686 808452 30742732 1d518cc vmlinux-new.o
> > 25515256 4387814 808452 30711522 1d49ee2 vmlinux-old.o
> >
> > that increases by 31k (0.123%), probably due to 1578 rdgsbase alternatives.
>
> I'm actually surprised that it increases the text size. The 'rdgsbase'
> instruction should be smaller than a 'mov %gs', so I would have
> expected the *data* size to increase due to the alternatives tables,
> but not the text size.
>
> [ Looks around ]
>
> Oh. It's because we put the altinstructions into the text section.
> That's kind of silly, but whatever.
>
> So I think that increase in text-size is not "real" - yes, it
> increases our binary size because we obviously have two instructions,
> but the actual *executable* part likely stays the same, and it's just
> that we grow the altinstruction metadata.
>
> Linus