Re: [PATCH 14/24] workqueue: Generalize unbound CPU pods

From: Tejun Heo
Date: Mon Aug 07 2023 - 21:09:03 EST


Hello,

On Mon, Jul 31, 2023 at 01:52:21PM -1000, Tejun Heo wrote:
> On Tue, Jul 11, 2023 at 08:32:27AM +0530, K Prateek Nayak wrote:
> > > Yeah, that's a bit surprising given that in terms of affinity behavior
> > > "numa" should be identical to base. The only meaningful differences that I
> > > can think of is when the work item is assigned to its worker and maybe how
> > > pwq max_active limit is applied. Hmm... can you monitor the number of
> > > kworker kthreads while running the benchmark? No need to do the whole
> > > matrix, just comparing base against numa should be enough.
> >
> > Sure. I'll get back to you with the data soon.
>
> Any updates? I'd like to proceed with the patchset as it helps resolving
> problems others are reporting. I can try to reproduce the results too if you
> can share more details on how they're run.

Prateek sent me how he tested along with workqueue traces. I tried to
reproduce on an AMD zen2 machine and here are the findings:

* The test has a high run-to-run variance. Even with cpufreq boost turned
off, the numbers reported every second within each run is relatively
stable but adjacent runs can report signficantly variable numbers. Maybe
initial thread placement has lingering effects?

On ryzen 3900x, 15 runs of `./tbench -c ./client.txt -t 60 32 127.0.0.1`:

Before: AVG=9066.43 STDEV=42.65
After : AVG=9076.11 STDEV=60.50

Given the stdev, I don't think this is indicating any meaningful
difference.

* I looked at what were consuming CPUs during the benchmark runs and also
Prateek's workqueue traces. None of the operations that tbench is doing
directly involves workqueue. I couldn't find a mechanism how workqueue
differences would cause any meaningful performance differences.

At least for tbench results, I couldn't find any signal.

Thanks.

--
tejun