Re: [PATCH] rcu: Use system_unbound_wq to avoid disturbing isolated CPUs

From: Waiman Long
Date: Thu Jul 25 2024 - 15:53:29 EST


On 7/25/24 15:33, Neeraj Upadhyay wrote:
On Thu, Jul 25, 2024 at 01:02:01PM -0400, Waiman Long wrote:
On 7/25/24 11:35, Neeraj Upadhyay wrote:
On Tue, Jul 23, 2024 at 02:10:25PM -0400, Waiman Long wrote:
It was discovered that isolated CPUs could sometimes be disturbed by
kworkers processing kfree_rcu() works causing higher than expected
latency. It is because the RCU core uses "system_wq" which doesn't have
the WQ_UNBOUND flag to handle all its work items. Fix this violation of
latency limits by using "system_unbound_wq" in the RCU core instead.
This will ensure that those work items will not be run on CPUs marked
as isolated.

Alternative approach here could be, in case we want to keep per CPU worker
pools, define a wq with WQ_CPU_INTENSIVE flag. Are there cases where
WQ_CPU_INTENSIVE wq won't be sufficient for the problem this patch
is fixing?
What exactly will we gain by defining a WQ_CPU_INTENSIVE workqueue? Or what
will we lose by using system_unbound_wq? All the calls that are modified to
use system_unbound_wq are using WORK_CPU_UNBOUND as their cpu. IOW, they
doesn't care which CPUs are used to run the work items. The only downside I
can see is the possible loss of some cache locality.

For the nohz_full case, where unbounded pool workers run only on housekeeping CPU
(cpu0), if multiple other CPUs are queuing work, the execution of those
works could get delayed. However, this should not generally happen as
other CPUs would be mostly running in user mode.
Well, it there is only one housekeeping CPU, a lot of background kernel tasks will be slowed down. Users should be careful about the proper balance between the number of housekeeping and nohz-full CPUs.


In fact, WQ_CPU_INTENSIVE can be considered a subset of WQ_UNBOUND. An
WQ_UNBOUND workqueue will avoid using isolated CPUs, but not a
WQ_CPU_INTENSIVE workqueue.
Got it, thanks!

I have picked the patch for further review and testing [1]


[1] https://git.kernel.org/pub/scm/linux/kernel/git/neeraj.upadhyay/linux-rcu.git/log/?h=next

Thanks, let me know if you see any problem.

Cheers,
Longman