Re: [PATCH 1/2] sched: Optimise task_mm_cid_work duration

From: Mathieu Desnoyers
Date: Mon Dec 02 2024 - 10:02:28 EST


On 2024-12-02 09:56, Gabriele Monaco wrote:
Hi Mathieu,

thanks for the quick reply.

Thanks for looking into this. I understand that you are after
minimizing the
latency introduced by task_mm_cid_work on isolated cores. I think
we'll need
to think a bit harder, because the proposed solution does not work:

  * for_each_cpu_from - iterate over CPUs present in @mask, from @cpu
to the end of @mask.

cpu is uninitialized. So this is completely broken.

My bad, wrong macro.. Should be for_each_cpu

Was this tested
against a workload that actually uses concurrency IDs to ensure it
does
not break the whole thing ? Did you run the rseq selftests ?


I did run the stress-ng --rseq command for a while and didn't see any
error reported, but it's probably not bulletproof. I'll use the
selftests for the next iterations.

Also, the mm_cidmask is a mask of concurrency IDs, not a mask of
CPUs. So
using it to iterate on CPUs is wrong.


Mmh I get it, during my tests I was definitely getting better results
than using the mm_cpus_allowed mask, but I guess that was a broken test
so it just doesn't count..
Do you think using mm_cpus_allowed would make more sense, with the
/risk/ of being a bit over-cautious?

mm_cpus_allowed can be updated dynamically by setting cpu affinity
and changing the cpusets. If we change the iteration from each possible
cpus to allowed cpus, then we need to adapt the allowed cpus updates
with the associated updates to the mm_cid as well. This is adding
complexity.

I understand that you wish to offload this task_work to a non-isolated
CPU (non-RT). If you do so, do you really care about the duration of
task_mm_cid_work enough to justify the added complexity to the
cpu affinity/cpusets updates ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com