Re: [PATCH 0/4] Introduce QPW for per-cpu operations

From: Frederic Weisbecker

Date: Tue Mar 03 2026 - 06:01:06 EST

Le Mon, Feb 23, 2026 at 10:18:40AM +0100, Michal Hocko a écrit :
> On Fri 20-02-26 11:30:16, Marcelo Tosatti wrote:
> > On Thu, Feb 19, 2026 at 08:30:31PM +0100, Michal Hocko wrote:
> > > On Thu 19-02-26 12:27:23, Marcelo Tosatti wrote:
> [...]
> > > and delayed pcp work that migh disturb such workload
> > > after it has returned to the userspace. Right?
> > > That is usually hauskeeping work that for, performance reasons, doesn't
> > > happen in hot paths while the workload was executing in the kernel
> > > space.
> > >
> > > There are more ways to deal with that. You can either change the hot
> > > path to not require deferred operation (tricky withtout introducing
> > > regressions for most workloads) or you can define a more suitable place
> > > to perform the housekeeping while still running in the kernel.
> > >
> > > Your QWP work relies on local_lock -> spin_lock transition and
> > > performing the pcp work remotely so you do not need to disturb that
> > > remote cpu. Correct?
> > >
> > > Alternative approach is to define a moment when the housekeeping
> > > operation is performed on that local cpu while still running in the
> > > kernel space - e.g. when returning to the userspace. Delayed work is
> > > then not necessary and userspace is not disrupted after returning to the
> > > userspace.
> > >
> > > Do I make more sense or does the above sound like a complete gibberish?
> >
> > OK, sure, but can't see how you can do that with per-CPU caches for
> > kmalloc, for example.
>
> As we have discussed in other subthread. By flushing those pcp caches on
> the return to userspace. Those flushes are not needed immediately. They
> just need to happen to allow operations listed by Vlastimil to finish.
> Or to avoid the problem by not using them but that is a separate
> discussion.
>
> I believe we can establish that any pcp delayed operation implemented
> through WQs can be flushed on the way to the userspace, right? The
> performance might be suboptimal but correctness will be preserved.
> So doing this on isolated CPUs could be an alternative to making changes
> to the pcp WQ handling.
>
> I haven't checked the WQ code deeply but I believe it should be feasible
> to flush all pcp WQs with pending work on the isolated cpu when the
> isolated workload returns to the userspace. This way we wouldn't need to
> special case each and every one of them.

If you look at flush_scheduled_work(), there is a big compile time warning
to prevent from using it because it's too deadlock prone.

But even if we flushed only some specific relevant workqueues, I'm not sure
that would help. The problem is not so much about locally queued workqueue
handling. On a preempt kernel, that would most likely issue a context switch
right away to handle the work. The problem is more about remote queuing while
the CPU is in critical code in userspace.

Also such a flush or local queue would involve multitasking and tick restart.
We could handle that with forcing to be stopped on userspace entry but that
doesn't simplify the picture.

Thanks.

--
Frederic Weisbecker
SUSE Labs