Control Dependencies vs C Compilers
From: Peter Zijlstra
Date: Tue Oct 06 2020 - 07:47:25 EST
Hi,
Let's give this linux-toolchains thing a test-run...
As some of you might know, there's a bit of a discrepancy between what
compiler and kernel people consider 'valid' use of the compiler :-)
One area where this shows up is in implicit (memory) ordering provided
by the hardware, which we kernel people would like to use to avoid
explicit fences (expensive) but which the compiler is unaware of and
could ruin (bad).
During the last LPC we had a session on that; find here:
https://linuxplumbersconf.org/event/7/contributions/821/
With recordings of the event here:
https://youtu.be/FFjV9f_Ub9o?t=89
That presentation covers 3 different implicit dependencies and various
ways in which a compiler can ruin the game. For this thread, I'd like to
limit things to just control-dependencies. We can start separate threads
for the other issues.
In short, the control dependency relies on the hardware never
speculating stores (instant OOTA) to provide a LOAD->STORE ordering.
That is, a LOAD must be completed to resolve a conditional branch, the
STORE is after the branch and cannot be made visible until the branch is
determined (which implies the load is complete).
However, our 'dear' C language has no clue of any of this.
So given code like:
x = *foo;
if (x > 42)
*bar = 1;
Which, if literally translated into assembly, would provide a
LOAD->STORE order between foo and bar, could, in the hands of an
evil^Woptimizing compiler, become:
x = *foo;
*bar = 1;
because it knows, through value tracking, that the condition must be
true.
Our Documentation/memory-barriers.txt has a Control Dependencies section
(which I shall not replicate here for brevity) which lists a number of
caveats. But in general the work-around we use is:
x = READ_ONCE(*foo);
if (x > 42)
WRITE_ONCE(*bar, 1);
Where READ/WRITE_ONCE() cast the variable volatile. The volatile
qualifier dissuades the compiler from assuming it knows things and we
then hope it will indeed emit the branch like we'd expect.
Now, hoping the compiler generates correct code is clearly not ideal and
very dangerous indeed. Which is why my question to the compiler folks
assembled here is:
Can we get a C language extention for this?
And while we have a fair number (and growing) existing users of this in
the kernel, I'd not be adverse to having to annotate them.
Any suggestions from the compiler people present on how they'd like to
provide us this feature?
Even just being able to detect this going wrong would be a step forward.
~ Peter