Re: [PATCH v2 2/5] Introducing qpw_lock() and per-cpu queue & flush work

From: Vlastimil Babka (SUSE)

Date: Mon Mar 09 2026 - 06:14:35 EST

On 3/8/26 19:00, Leonardo Bras wrote:
> On Tue, Mar 03, 2026 at 01:02:13PM -0300, Marcelo Tosatti wrote:
>> On Tue, Mar 03, 2026 at 01:03:36PM +0100, Vlastimil Babka (SUSE) wrote:
>> > On 3/2/26 16:49, Marcelo Tosatti wrote:
>> > > +#define local_qpw_lock(lock) \
>> > > + do { \
>> > > + if (static_branch_maybe(CONFIG_QPW_DEFAULT, &qpw_sl)) { \
>> > > + migrate_disable(); \
>> >
>> > Have you considered using migrate_disable() on PREEMPT_RT and
>> > preempt_disable() on !PREEMPT_RT since it's cheaper? It's what the pcp
>> > locking in mm/page_alloc.c does, for that reason. It should reduce the
>> > overhead with qpw=1 on !PREEMPT_RT.
>>
>> migrate_disable:
>> Patched kernel, CONFIG_QPW=y, qpw=1: 192 cycles
>>
>> preempt_disable:
>> [ 65.497223] kmalloc_bench: Avg cycles per kmalloc: 184 cycles
>>
>> I tried it before, but it was crashing for some reason which i didnt
>> look into (perhaps PREEMPT_RT was enabled).
>>
>> Will change this for the next iteration, thanks.
>>
>
> Hi all,
>
> That made me remember that rt spinlock already uses migrate_disable and
> non-rt spinlocks already have preempt_disable()
>
> Maybe it's actually worth adding a local_spin_lock() in spinlock{,_rt}.c
> whichy would get the per-cpu variable inside the preempt/migrate_disable
> area, and making use of it in qpw code. That way we avoid nesting
> migtrate_disable or preempt_disable, and further reducing impact.

That would be nice indeed. But since the nested disable/enable cost should
be low, and the spinlock code rather complicated, it might be tough to sell.
It would be also great to have those trylocks inline on all arches.

> The alternative is to not have migrate/preempt disable here and actually
> trust the ones inside the locking primitives. Is there a chance of
> contention, but I don't remember being able to detect it.

So then we could pick the lock on one cpu but then get migrated and actually
lock it on another cpu. Is contention the only possible downside of this, or
could it lead to subtle bugs depending on the particular user? The paths
that don't flush stuff on remote cpus but expect working with the local
cpu's structure in a fastpath might get broken. I'd be wary of this.

> Thanks!
> Leo