Re: [PATCH RFC 0/5] workqueue: add WQ_AFFN_CACHE_SHARD affinity scope

Next message: Ahmed Naseef: "Re: [PATCH 3/3] PCI: Skip bridge window reads when window is not supported"
Previous message: Sebastian Reichel: "[PATCH v3 05/12] phy: rockchip: usbdp: Fix LFPS detect threshold control"
In reply to: Breno Leitao: "[PATCH RFC 5/5] tools/workqueue: add CACHE_SHARD support to wq_dump.py"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Tejun Heo

Date: Fri Mar 13 2026 - 14:00:40 EST

Hello,

Applied 1/5. Some comments on the rest:

- The sharding currently splits on CPU boundary, which can split SMT
siblings across different pods. The worse performance on Intel compared
to SMT scope may be indicating exactly this - HT siblings ending up in
different pods. It'd be better to shard on core boundary so that SMT
siblings always stay together.

- How was the default shard size of 8 picked? There's a tradeoff between
the number of kworkers created and locality. Can you also report the
number of kworkers for each configuration? And is there data on
different shard sizes? It'd be useful to see how the numbers change
across e.g. 4, 8, 16, 32.

- Can you also test on AMD machines? Their CCD topology (16 or 32
threads per LLC) would be a good data point.

Thanks.

--
tejun