Re: [RFC PATCH v4] sched: Fix performance regression introduced by mm_cid
From: Mathieu Desnoyers
Date: Thu Apr 13 2023 - 11:37:24 EST
On 2023-04-13 11:20, Peter Zijlstra wrote:
On Thu, Apr 13, 2023 at 09:56:38AM -0400, Mathieu Desnoyers wrote:
Mathieu, WDYT? -- other than that the patch is an obvious hack :-)
I hate it with passion :-)
It is quite specific to your workload/configuration.
If we take for instance a process with a large mm_users count which is
eventually affined to a subset of the cpus with cpusets or
sched_setaffinity, your patch will prevent compaction of the concurrency ids
when it really should not.
I don't think it will, it will only kick in once the higest cid is
handed out (I should've used num_online_cpus() instead of nr_cpu_ids),
and with affinity at play that should never happen.
So in that case, this optimization will only work if affinity is not
set. E.g. a hackbench with cpuset or sched_setaffinity excluding one
core from the set will still be slower.
Now, the more fancy scheme with:
min(t->nr_cpus_allowed, atomic_read(&t->mm->mm_users))
that does get to be more complex; and I've yet to find a working version
that doesn't also need a for_each_cpu() loop on for reclaim :/
Indeed. And with a allowed cpus approach, we need to carefully consider
what happens if we change a allowed cpu mask from one set to another
set, e.g, from allowing cpus 0, 1 to allowing only cpus 2, 3. There will
be task migration, and we need to reclaim the cids from 0, 1, but we can
very well be in a case where the number of mm_users is above the number
of allowed cpus.
Anyway, I think the hack as presented is safe, but a hack none-the-less.
I don't think it is _unsafe_, but it will only trigger in specific
scenarios, which makes it harder to understand more subtle performance
regressions for scenarios that are not covered by this hack.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com