Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Nadav Amit
Date: Tue Oct 10 2023 - 14:26:02 EST




> On Oct 10, 2023, at 9:22 PM, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> On Tue, Oct 10, 2023 at 7:32 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On Tue, 10 Oct 2023 at 09:43, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>>>
>>> Implementing arch_raw_cpu_ptr() in C, allows the compiler to perform
>>> better optimizations, such as setting an appropriate base to compute
>>> the address instead of an add instruction.
>>
>> Hmm. I wonder..
>>
>>> + tcp_ptr__ = __raw_cpu_read(, this_cpu_off) + (unsigned long)(ptr); \
>>
>> Do we really even want to use __raw_cpu_read(this_cpu_off) at all?
>
> Please note that besides propagation of the addition into address, the
> patch also exposes memory load to the compiler, with the anticipation
> that the compiler CSEs the load from this_cpu_off from eventual
> multiple addresses. For this to work, we have to get rid of the asms.
> It is important that the compiler knows that this is a memory load, so
> it can also apply other compiler magic to it.
>
> BTW: A follow-up patch will also use__raw_cpu_read to implement
> this_cpu_read_stable. We can then read "const aliased" current_task to
> CSE the load even more, something similar to [1].

I was just writing the same thing. :)

As a minor note the proposed assembly version seems to be missing
__FORCE_ORDER as an input argument to prevent reordering past preempt_enable
and preempt_disable. But that’s really not the main point.