Re: [mm/page_alloc] f26b3fa046: netperf.Throughput_Mbps -18.0% regression

From: Linus Torvalds
Date: Tue May 10 2022 - 15:25:42 EST


On Tue, May 10, 2022 at 12:03 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> I think the PV case already basically does that - replacing the the
> "store release" with a much more complex sequence. No?

Looking around, the PV case is absolutely horrid, and does a
cmpxchg_release() on the unlock path. Yeah, that would make the unlock
*much* more expensive.

And I guess that's fairly fundamental. Even if you were to avoid an
explicitly atomic access - do the unlock a non-atomic write followed
by a non-atomic "read pending and see if we need to something
expensive", just that check would have to involve at a minimum a
memory barrier, so it ends up being expensive even for the
non-contended case.

Linus