Re: [RFC PATCH v2] membarrier: expedited private command

From: Avi Kivity
Date: Tue Aug 01 2017 - 06:32:55 EST




On 08/01/2017 01:22 PM, Peter Zijlstra wrote:

If mm cpumask is used, I think it's okay. You can cause quite similar
kind of iteration over CPUs and lots of IPIs, tlb flushes, etc using
munmap/mprotect/etc, or context switch IPIs, etc. Are we reaching the
stage where we're controlling those kinds of ops in terms of impact
to the rest of the system?
So x86 has a tight mm_cpumask(), we only broadcast TLB invalidate IPIs
to those CPUs actually running threads of our process (or very
recently). So while there can be the sporadic stray IPI for a CPU that
recently ran a thread of the target process, it will not get another one
until it switches back into the process.

On machines that need manual TLB broadcasts and don't keep a tight mask,
yes you can interfere at will, but if they care they can fix by
tightening the mask.

In either case, the mm_cpumask() will be bounded by the set of CPUs the
threads are allowed to run on and will not interfere with the rest of
the system.

As to scheduler IPIs, those are limited to the CPUs the user is limited
to and are rate limited by the wakeup-latency of the tasks. After all,
all the time a task is runnable but not running, wakeups are no-ops.

Trouble is of course, that not everybody even sets a single bit in
mm_cpumask() and those that never clear bits will end up with a fairly
wide mask, still interfering with work that isn't hard partitioned.

I hate to propose a way to make this more complicated, but this could be fixed by a process first declaring its intent to use expedited process-wide membarrier; if it does, then every context switch updates a process-wide cpumask indicating which cpus are currently running threads of that process:

if (prev->mm != next->mm)
if (prev->mm->running_cpumask)
cpumask_clear(...);
else if (next->mm->running_cpumask)
cpumask_set(...);

now only processes that want expedited process-wide membarrier pay for it (in other than some predictable branches). You can even have threads opt-in, so unrelated threads that don't participate in the party don't cause those bits to be set.