Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for workqueues

From: Tejun Heo
Date: Wed May 10 2023 - 15:19:27 EST


On Wed, May 10, 2023 at 11:57:41AM -0700, Brian Norris wrote:
> Test case: iperf TCP RX (i.e., hits "MWIFIEX_RX_WORK_QUEUE" a lot) at
> some of the higher (VHT 80 MHz) data rates.
> Hardware: Mediatek MT8173 2xA53 (little) + 2xA72 (big) CPU
> (I'm not familiar with its cache details)
> +
> Marvell SD8897 SDIO WiFi (mwifiex_sdio)

Yeah, we had multiple of similar cases on, what I think are, similar
configurations, which is why I'm working on improving workqueue locality.

> We're looking at a major regression from our 4.19 kernel to a 5.15
> kernel (yeah, that's downstream reality). So far, we've found that
> performance is:

That's curious. 4.19 is old but I scanned the history and there's nothing
which can cause that kind of perf regression for unbound workqueues between
4.19 and 5.15.

> (1) much better (nearly the same as 4.19) if we add WQ_SYSFS and pin the
> work queue to one CPU (doesn't really matter which CPU, as long as it's
> not the one loaded with IRQ(?) work)
> (2) moderately better if we pin the CPU frequency (e.g., "performance"
> cpufreq governor instead of "schedutil")
> (3) moderately better (not quite as good as (2)) if we switch a
> kthread_worker and don't pin anything.

Hmm... so it's not just workqueue.

> We tried (2) because we saw a lot more CPU migration on kernel 5.15
> (work moves across all 4 CPUs throughout the run; on kernel 4.19 it
> mostly switched between 2 CPUs).

Workqueue can contribute to this but it seems more likely that scheduling
changes are also part of the story.

> We tried (3) suspecting some kind of EAS issue (instead of distributing
> our workload onto 4 different kworkers, our work (and therefore our load
> calculation) is mostly confined to a single kernel thread). But it still
> seems like our issues are more than "just" EAS / cpufreq issues, since
> (2) and (3) aren't as good as (1).
> NB: there weren't many relevant mwifiex or MTK-SDIO changes in this
> range.
> So we're still investigating a few other areas, but it does seem like
> "locality" (in some sense of the word) is relevant. We'd probably be
> open to testing any patches you have, although it's likely we'd have the
> easiest time if we can port those to 5.15. We're constantly working on
> getting good upstream support for Chromebook chips, but ARM SoC reality
> is that it still varies a lot as to how much works upstream on any given
> system.

I should be able to post the patchset later today or tomorrow. It comes with
sysfs knobs to control affinity scopes and strictness, so hopefully you
should be able to find the configuration that works without too much