Re: [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope

Next message: Christian Brauner: "Re: [PATCH 0/2] bpf: classify block device hooks and add selftests"
Previous message: Brett A C Sheffield: "Re: [PATCH 6.19 000/220] 6.19.10-rc1 review"
In reply to: Breno Leitao: "[PATCH v2 3/5] workqueue: set WQ_AFFN_CACHE_SHARD as the default affinity scope"
Next in thread: Breno Leitao: "Re: [PATCH v2 0/5] workqueue: Introduce a sharded cache affinity scope"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Chuck Lever

Date: Mon Mar 23 2026 - 10:29:01 EST

On Fri, Mar 20, 2026, at 1:56 PM, Breno Leitao wrote:
> TL;DR: Some modern processors have many CPUs per LLC (L3 cache), and
> unbound workqueues using the default affinity (WQ_AFFN_CACHE) collapse
> to a single worker pool, causing heavy spinlock (pool->lock) contention.
> Create a new affinity (WQ_AFFN_CACHE_SHARD) that caps each pool at
> wq_cache_shard_size CPUs (default 8).
>
> Changes from RFC:
>
> * wq_cache_shard_size is in terms of cores (not vCPU). So,
> wq_cache_shard_size=8 means the pool will have 8 cores and their siblings,
> like 16 threads/CPUs if SMT=1

My concern about the "cores per shard" approach is that it
improves the default situation for moderately-sized machines
little or not at all.

A machine with one L3 and 10 cores will go from 1 UNBOUND
pool to only 2. For virtual machines commonly deployed as
cloud instances, which are 2, 4, or 8 core systems (up to
16 threads) there will still be significant contention for
UNBOUND workers.

IOW, if you want good scaling, human intervention (via a
boot command-line option) is still needed.

--
Chuck Lever