Re: [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope
From: Chuck Lever
Date: Mon Mar 23 2026 - 10:29:01 EST
On Fri, Mar 20, 2026, at 1:56 PM, Breno Leitao wrote:
> TL;DR: Some modern processors have many CPUs per LLC (L3 cache), and
> unbound workqueues using the default affinity (WQ_AFFN_CACHE) collapse
> to a single worker pool, causing heavy spinlock (pool->lock) contention.
> Create a new affinity (WQ_AFFN_CACHE_SHARD) that caps each pool at
> wq_cache_shard_size CPUs (default 8).
>
> Changes from RFC:
>
> * wq_cache_shard_size is in terms of cores (not vCPU). So,
> wq_cache_shard_size=8 means the pool will have 8 cores and their siblings,
> like 16 threads/CPUs if SMT=1
My concern about the "cores per shard" approach is that it
improves the default situation for moderately-sized machines
little or not at all.
A machine with one L3 and 10 cores will go from 1 UNBOUND
pool to only 2. For virtual machines commonly deployed as
cloud instances, which are 2, 4, or 8 core systems (up to
16 threads) there will still be significant contention for
UNBOUND workers.
IOW, if you want good scaling, human intervention (via a
boot command-line option) is still needed.
--
Chuck Lever