Hi Mathieu,
thanks for the quick reply.
Thanks for looking into this. I understand that you are after
minimizing the
latency introduced by task_mm_cid_work on isolated cores. I think
we'll need
to think a bit harder, because the proposed solution does not work:
* for_each_cpu_from - iterate over CPUs present in @mask, from @cpu
to the end of @mask.
cpu is uninitialized. So this is completely broken.
My bad, wrong macro.. Should be for_each_cpu
Was this tested
against a workload that actually uses concurrency IDs to ensure it
does
not break the whole thing ? Did you run the rseq selftests ?
I did run the stress-ng --rseq command for a while and didn't see any
error reported, but it's probably not bulletproof. I'll use the
selftests for the next iterations.
Also, the mm_cidmask is a mask of concurrency IDs, not a mask of
CPUs. So
using it to iterate on CPUs is wrong.
Mmh I get it, during my tests I was definitely getting better results
than using the mm_cpus_allowed mask, but I guess that was a broken test
so it just doesn't count..
Do you think using mm_cpus_allowed would make more sense, with the
/risk/ of being a bit over-cautious?