Re: [patch 2/3] scheduler: add full memory barriers upon taskswitch at runqueue lock/unlock

From: Mathieu Desnoyers
Date: Mon Feb 01 2010 - 14:56:35 EST


* Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx) wrote:
>
>
> On Mon, 1 Feb 2010, Mathieu Desnoyers wrote:
> >
> > Here is the detailed execution scenario showing the race.
>
> No. You've added random smp_mb() calls, but you don't actually show what
> the f*ck they are protecting against.
>
> For example
>
> > First sys_membarrier smp_mb():
>
> I'm not AT ALL interested in the sys_membarrier() parts. You can hav ea
> million memory barriers there, and I won't care. I'm interested in what
> you think the memory barriers elsewhere protect against. It's a barrier
> between _which_ two operations?
>
> You can't say it's a barrier "around" the
>
> cpumask_clear(mm_cpumask, cpu);
>
> because a barrier is between two things. So if you want to add two
> barriers around that mm_cpumask acces, you need to describe the _three_
> events you're barriers between in that call-path (with mm_cpumask being
> just one of them)
>
> And then, once you've described _those_ three events, you describe what
> the sys_membarrier interaction is, and how mm_cpumask is involved there.
>
> I'm not interested in the user-space code. Don't even quote it. It's
> irrelevant apart from the actual semantics you want to guarantee for the
> new membarrier() system call. So don't quote the code, just explain what
> the actual barriers are.
>

The two event pairs we are looking at are:

Pair 1)

* memory accesses (load/stores) performed by user-space thread before
context switch.
* cpumask_clear_cpu(cpu, mm_cpumask(prev));

Pair 2)

* cpumask_set_cpu(cpu, mm_cpumask(next));
* memory accessses (load/stores) performed by user-space thread after
context switch.

I can see two ways to add memory barriers in switch_mm that would
provide ordering for these two memory access pairs:

Either A)

switch_mm()
smp_mb__before_clear_bit();
cpumask_clear_cpu(cpu, mm_cpumask(prev));
cpumask_set_cpu(cpu, mm_cpumask(next));
smp_mb__after_set_bit();

or B)

switch_mm()
cpumask_set_cpu(cpu, mm_cpumask(next));
smp_mb__before_clear_bit();
cpumask_clear_cpu(cpu, mm_cpumask(prev));

(B) seems like a clear win, as we get the ordering right for both pairs
with a single memory barrier, but I don't know if changing the set/clear
bit order could have nasty side-effects on other mm_cpumask users.

sys_membarrier uses the mm_cpumask to iterate on all CPUs on which the
current process's mm is in use, so it can issue a smp_mb() through an
IPI on all CPUs that need it. Without appropriate ordering of pairs 1-2
detailed above, we could miss a CPU that actually needs a memory
barrier.

Thanks,

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/