Re: [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope

Next message: Petr Mladek: "[PATCH] doc: watchdog: Futher improvements"
Previous message: Selvin Xavier: "Re: [PATCH rdma-next 1/4] RDMA/bnxt_re: Simplify bnxt_re_init_depth() callers and implementation"
In reply to: Chuck Lever: "Re: [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope"
Next in thread: Chuck Lever: "Re: [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Breno Leitao

Date: Mon Mar 23 2026 - 13:20:11 EST

Hello Chuck,

On Mon, Mar 23, 2026 at 11:28:49AM -0400, Chuck Lever wrote:
> On 3/23/26 11:10 AM, Breno Leitao wrote:
> >
> > I am not convinced. The wq_cache_shard_size approach creates multiple
> > pools on large systems while leaving small systems (<8 cores) unchanged.
>
> This is exactly my concern. Smaller systems /do/ experience measurable
> contention in this area. I don't object to your series at all, it's
> clean and well-motivated; but the cores-per-shard approach doesn't scale
> down to very commonly deployed machine sizes.

I don't see why the cores-per-shard approach wouldn't scale down
effectively.

The sharding mechanism itself is independent of whether we use
cores-per-shard or shards-per-LLC as the allocation strategy, correct?

Regardless of the approach, we retain full control over the granularity
of the shards.

> We might also argue that the NFS client and other subsystems that make
> significant use of UNBOUND workqueues in their I/O paths might be well
> advised to modify their approach. (net/sunrpc/sched.c, hint hint)
>
>
> > This eliminates the pathological lock contention we're observing on
> > high-core-count machines without impacting smaller deployments.
>
> > In contrast, splitting pools per LLC would force fragmentation even on
> > systems that aren't experiencing contention, increasing the need for
> > manual tuning across a wider range of configurations.
>
> I claim that smaller deployments also need help. Further, I don't see
> how UNBOUND pool fragmentation is a problem on such systems that needs
> to be addressed (IMHO).

Are you suggesting we should reduce the default value to something like
wq_cache_shard_size=2 instead of wq_cache_shard_size=8?

Thanks for the feedback,
--breno