Re: [PATCH 0/4] Introduce QPW for per-cpu operations

From: Leonardo Bras

Date: Sat Feb 14 2026 - 16:36:57 EST

On Wed, Feb 11, 2026 at 09:11:21AM -0300, Marcelo Tosatti wrote:
> On Wed, Feb 11, 2026 at 09:01:12AM -0300, Marcelo Tosatti wrote:
> > On Tue, Feb 10, 2026 at 03:01:10PM +0100, Michal Hocko wrote:
> > > On Fri 06-02-26 11:34:30, Marcelo Tosatti wrote:
> > > > The problem:
> > > > Some places in the kernel implement a parallel programming strategy
> > > > consisting on local_locks() for most of the work, and some rare remote
> > > > operations are scheduled on target cpu. This keeps cache bouncing low since
> > > > cacheline tends to be mostly local, and avoids the cost of locks in non-RT
> > > > kernels, even though the very few remote operations will be expensive due
> > > > to scheduling overhead.
> > > >
> > > > On the other hand, for RT workloads this can represent a problem: getting
> > > > an important workload scheduled out to deal with remote requests is
> > > > sure to introduce unexpected deadline misses.
> > > >
> > > > The idea:
> > > > Currently with PREEMPT_RT=y, local_locks() become per-cpu spinlocks.
> > > > In this case, instead of scheduling work on a remote cpu, it should
> > > > be safe to grab that remote cpu's per-cpu spinlock and run the required
> > > > work locally. That major cost, which is un/locking in every local function,
> > > > already happens in PREEMPT_RT.
> > > >
> > > > Also, there is no need to worry about extra cache bouncing:
> > > > The cacheline invalidation already happens due to schedule_work_on().
> > > >
> > > > This will avoid schedule_work_on(), and thus avoid scheduling-out an
> > > > RT workload.
> > > >
> > > > Proposed solution:
> > > > A new interface called Queue PerCPU Work (QPW), which should replace
> > > > Work Queue in the above mentioned use case.
> > > >
> > > > If PREEMPT_RT=n this interfaces just wraps the current
> > > > local_locks + WorkQueue behavior, so no expected change in runtime.
> > > >
> > > > If PREEMPT_RT=y, or CONFIG_QPW=y, queue_percpu_work_on(cpu,...) will
> > > > lock that cpu's per-cpu structure and perform work on it locally.
> > > > This is possible because on functions that can be used for performing
> > > > remote work on remote per-cpu structures, the local_lock (which is already
> > > > a this_cpu spinlock()), will be replaced by a qpw_spinlock(), which
> > > > is able to get the per_cpu spinlock() for the cpu passed as parameter.
> > >
> > > What about !PREEMPT_RT? We have people running isolated workloads and
> > > these sorts of pcp disruptions are really unwelcome as well. They do not
> > > have requirements as strong as RT workloads but the underlying
> > > fundamental problem is the same. Frederic (now CCed) is working on
> > > moving those pcp book keeping activities to be executed to the return to
> > > the userspace which should be taking care of both RT and non-RT
> > > configurations AFAICS.
> >
> > Michal,
> >
> > For !PREEMPT_RT, _if_ you select CONFIG_QPW=y, then there is a kernel
> > boot option qpw=y/n, which controls whether the behaviour will be
> > similar (the spinlock is taken on local_lock, similar to PREEMPT_RT).
> >
> > If CONFIG_QPW=n, or kernel boot option qpw=n, then only local_lock
> > (and remote work via work_queue) is used.
>
> OK, this is not true. There is only CONFIG_QPW and the qpw=yes/no kernel
> boot option for control.
>
> CONFIG_PREEMPT_RT should probably select CONFIG_QPW=y and
> CONFIG_QPW_DEFAULT=y.

Fully agree :)

>
> > What "pcp book keeping activities" you refer to ? I don't see how
> > moving certain activities that happen under SLUB or LRU spinlocks
> > to happen before return to userspace changes things related
> > to avoidance of CPU interruption ?
> >
> > Thanks
> >
>