Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic locking insns

From: Uros Bizjak
Date: Wed Mar 05 2025 - 14:48:12 EST


On Wed, Mar 5, 2025 at 6:04 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 4 Mar 2025 at 22:54, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
> >
> > Even to my surprise, the patch has some noticeable effects on the
> > performance, please see the attachment in [1] for LMBench data or [2]
> > for some excerpts from the data. So, I think the patch has potential
> > to improve the performance.
>
> I suspect some of the performance difference - which looks
> unexpectedly large - is due to having run them on a CPU with the
> horrendous indirect return costs, and then inlining can make a huge
> difference.
> kvm
> Regardless, I absolutely think that using asm_inline here is the right
> thing for the locked instructions.

It is "Intel(R) Core(TM) i7-10710U"

> That said, I do want to bring up another issue: maybe it's time to
> just retire the LOCK_PREFIX thing entirely?
>
> It harkens back to Ye Olde Days when UP was the norm, and we didn't
> want to pay the cost of lock prefixes when the kernel was built for
> SMP but was run on an UP machine.
>
> And honestly, none of that makes sense any more. You can't buy a UP
> machine any more, and the only UP case would be some silly minimal
> virtual environment, and if people really care about that minimal
> case, they should just compile the kernel without SMP support.
> Becxause UP has gone from being the default to being irrelevant. At
> least for x86-64.
>
> So I think we should just get rid of LOCK_PREFIX_HERE and the
> smp_locks section entirely.

Please note that this functionality is shared with i386 target, so the
removal, proposed above, would somehow penalize 32bit targets. The
situation w.r.t. UP vs SMP is not that clear there, maybe some distro
still provides i386 SMP kernels that would then run unoptimized on UP
systems.

>From the compiler POV, now that "lock; " prefix lost its semicolon,
removing LOCK_PREFIX_HERE or using asm_inline would result in exactly
the same code. The problematic 31k code size increase (1.1%) with -O2
is inevitable either way, if we want to move forward.

My proposal would be to use what a modern compiler offers. By using
asm_inline, we can keep the status quo (mostly for i386) for some more
time, and still move forward. And we get that -Os code size *decrease*
as a bonus for those that want to shave the last byte from the kernel.

Thanks,
Uros.