On Fri, 07 Jul 2023, Jonas Oberhauser <jonas.oberhauser@xxxxxxxxxxxxxxx> wrote:
[...]
Agreed. Our intent is not to close the gap completely, but to reduceThis is a request for comments on extending the atomic builtins API to
help avoiding redundant memory barriers. Indeed, there are
discrepancies between the Linux kernel consistency memory model (LKMM)
and the C11/C++11 memory consistency model [0]. For example,
fully-ordered atomic operations like xchg and cmpxchg success in LKMM
have implicit memory barriers before/after the operations [1-2], while
atomic operations using the __ATOMIC_SEQ_CST memory order in C11/C++11
do not have any ordering guarantees of an atomic thread fence
__ATOMIC_SEQ_CST with respect to other non-SEQ_CST operations [3].
The issues run quite a bit deeper than this. The two models have two
completely different perspectives that are quite much incompatible.
the gap between the two models, by supporting the "full barrier
before/after" semantic of LKMM in the C11/C++11 memory model.
I think all you can really do is bridge the gap at the level of the[...]
generated assembly. I.e., don't bridge the gap between LKMM and the
C11 MCM. Bridge the gap between the assembly code generated by C11
atomics and the one generated by LKMM. But I'm not sure that's really
the task here.
However, nothing prevents a toolchain from changing the emitted
assembler in the future, which would make things fragile. The only
thing that is guaranteed to not change is the definitions in the
standard (C11/C++11). Anything else is fair game for optimizations.
That would not improve anything for RMW. Consider the following example[...] For example, to make Read-Modify-Write (RMW) operations matchDoes it have to though? Can't you just do e.g. an release RMW
the Linux kernel "full barrier before/after" semantics, the liburcu's
uatomic API has to emit both a SEQ_CST RMW operation and a subsequent
thread fence SEQ_CST, which leads to duplicated barriers in some cases.
operation followed by an after_atomic fence? And for loads, a
SEQ_CST fence followed by an acquire load? Analogously (but: mirrored)
for stores.
and its resulting assembler on x86-64 gcc 13.1 -O2:
int exchange(int *x, int y)
{
int r = __atomic_exchange_n(x, y, __ATOMIC_RELEASE);
__atomic_thread_fence(__ATOMIC_SEQ_CST);
return r;
}
exchange:
movl %esi, %eax
xchgl (%rdi), %eax
lock orq $0, (%rsp) ;; Redundant with previous exchange
ret
You mentioned that the goal is to check some code written using LKMMWe aim to validate with TSAN the code that will run during production,
primitives with TSAN due to some formal requirements. What exactly do
these requirements entail? Do you need to check the code exactly as it
will be executed (modulo the TSAN instrumentation)? Is it an option to
map to normal builtins with suboptimal performance just for the
verification purpose, but then run the slightly more optimized
original code later?
minus TSAN itself.
Specifically for TSAN's ordering requirements, you may need to makeThis is why we have implemented our primitives and changed our
LKMM's RMWs into acq+rel with an extra mb, even if all that extra
ordering isn't necessary at the assembler level.
Also note that no matter what you do, due to the two different
perspectives, TSAN's hb relation may introduce false positive data
races w.r.t. LKMM. For example, if the happens-before ordering is
guaranteed through pb starting with coe/fre.
algorithms so that they use the acquire/release semantics of the
C11/C++11 memory model.
Without thinking too hard, it seems to me no matter what fences andWe have come to the same conclusion, mainly because TSAN does not
barriers you introduce, TSAN will not see this kind of ordering and
consider the situation a data race.
support thread fence in its verifications.