Re: LKMM: Making RMW barriers explicit
From: Jonas Oberhauser
Date: Thu May 23 2024 - 16:31:12 EST
Am 5/23/2024 um 6:35 PM schrieb Andrea Parri:
I would phrase it more extreme: I want to get rid of the unnecessary
non-standard parts of the herd representation.
Please indulge the thought that what might appear to be "non-standard"
to one or one's community might appear differently to others.
Certainly. And of course, the formalization of the LKMM doesn't have to
be optimized for the WMM community, even though I suspect that they (and
possibly the tools they develop) are the main direct consumers of the
formalization.
Continuing with the example above, I'm pretty sure your "standard" and
simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads
will turn up the nose of many kernel developers...
I'm not sure how true that is. Firstly I'm not sure how many kernel
developers really read the formalization and try to see what it does,
rather than relying on tools to gain an intuition of what's going on.
But let's say a kernel developer wants to make sure that their intuition
of how cmpxchg works is matched by the formal model, e.g., they want to
double check that the formal model is correct
after they got some unexpected results in a herd7 litmus test.
How would they go about it today? The only way is to read the OCaml
source code, because there's no other code that obviously defines the
behavior. At best, they would read
atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)
atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W)
see the Acquire, Release, and Mb references in the model, and think "ok,
so '__cmpxchg{acquire}' means I get an Acquire tag in the success case
which gives acq_po ordering, '__cmpxchg{release}' means I get a Release
tag in the success case which gives po_rel ordering. Therefore
'__cmpxchg{mb}' must give me an Mb tag in the success case. That would
give me mb ordering, but not between the store and subsequent
operations, and that's just wrong."
But of course this "understanding by analogy" is broken, because there's
a bunch of special OCaml match expressions in herd that look only for
release or mb and just give totally different behavior for that one
case. Every other tag behaves exactly the same way.
At worst they wouldn't even guess that this only is the success ordering
(that's definitely what happened to us in the weak memory model
community, but we don't matter for this argument).
A better situation would be to do something like this:
(*success*)(*fail*)
atomic_cmpxchg_acquire(X,V,W) __cmpxchg {acquire} {once} (X,V,W)
atomic_cmpxchg_release(X,V,W) __cmpxchg {release} {once} (X,V,W)
atomic_cmpxchg(X,V,W) __cmpxchg {mb} {once} (X,V,W)
and being explicit in the model about what the Mb tag does:
| [M];fencerel(Mb);[M] (* smp_mb() and operations with Mb ordering
such as cmpxchg() *)
| [M];po;[R & Mb] (* reads with an Mb tag - coming from full fence
rmws - are ordered as-if there was an smp_mb() right before the read *)
| [W & Mb];po;[M] (* correspondingly, writes with an Mb tag are
ordered as-if there was an smp_mb() right after the write *)
And suddenly you can read the model and map it 1:1 to the intuition that
you can treat a successful cmpxchg() (or an xchg() etc.) as-if it was
enclosed by smp_mb().
There doesn't need to be a real fence.
The current repre-
sentation for xchg() was described in the ASPLOS'18 work
Do you mean the one example in Table 3?
What about cmpxchg() or cmpxchg_acquire()?
But that's not the point, "standards" can change and certainly kernels
and tools do. My remark was more to point out that you're not getting
rid of anything...,
We're definitely getting rid of some lines in herd7, that have been
added solely for dealing with this specific case of LKMM.
For example, there's some code looking like this (for a memory ordering
tag `a`)
(match a with ["release"] -> a | _ -> a_once)
which specifically refers to the release tag in LKMM and we can turn
that into
a
with no more reference to any LKMM tags. Of course, this is not the
Linux community's problem, just the herd (and others who want to use the
literal cat file of LKMM) maintainers who have to deal with it.
And imagine that at some point you want to add another tag to the linux
kernel - like for example for 'accessing an atomic in initialization
code, so that the compiler can do optimizations like merge a bunch of
bitwise operations'. Let's call it 'init.
What will be the behavior of
WRITE_INIT(X,V) __store{init}(X,V);
with the current herd7? Honestly I have no clue, because there might be a
(match a with ["release"] -> a | _ -> a_once)
somewhere that will turn this 'init into 'once. We'd have to comb the
herd7 codebase to know for sure (or hope that our experiments are
sufficiently thorough to catch all cases).
>you're simply proposing a different representation
I wouldn't phrase it like that, since it's a representation many people
familiar with WMM would expect, but admittedly that doesn't have to be
the deciding factor.
asking kernel developers & maintainers to "deal with it"
Deal with what, no longer having to learn OCaml to be sure that the
LKMM's formal definition matches the description in memory_barriers.txt?
Otherwise, I don't see anything that changes for the worse.
With the change, people need to think for a few seconds to see that the
explicit rules in the .cat file are a formalization of "treat xchg() as
if it had smp_mb() before and after".
Without the change, they need to read an OCaml codebase to see that is
meant by
atomic_xchg(X,V,W) __xchg{mb}(X,V,W)
So I don't see any downsides.
I would also support making the representation explicit in the .def
files instead, with something like
cmp_xchg(x,e,v) = { if (*x==e) { smp_mb(); int r =
__cmp_xchg{once}(x,e,v) ; smp_mb(); return r } else { __read{once}(x) } }
Then you get to keep the representation.
Without knowing herd's internals, I have no idea of how to seriously
specify a meta language so that herd could effectively and efficiently
deal with it in general. But one could at least envision some specific
syntax for cmpxchg with a code sequence for the failure case and a code
sequence for the success case.
Personally I don't prefer this option, firstly because I don't see a
good reason for the Linux community to go their own way here. I don't
think there's really much of a problem with saying "this is the
intuition; this is how we formalize it" and for the two not to be
completely identical. It happens all the time, and in this case the gap
between "we formalize it by really having two smp_mb() appear in the
graph if it's successful" and "we formalize it by providing the same
ordering that the smp_mb() would have if it was there" isn't big.
Especially for those kernel people who have enough motivation to
actually deep dive into the formalization in the first place. But it
makes it a lot more accessible for the WMM community, which can only
benefit LKMM.
And secondly because people will probably mostly focus on the .cat file,
which means that it's still a bit of a booby trap to put something so
important (and perhaps surprising) into the .def file, but at least one
that is in the open for people who are more careful and also read the
def file.
> I am looking forward to something more than "because we can".
- it makes it easier to maintain the LKMM in the future, because you
don't have to work around hidden transformations inside herd7
- it makes implicit behavior explicit
- it makes it easier to understand that the formalization matches the
intention
- it makes it easier to learn the LKMM from the formalization without
having to cross-reference every bit with the informal documentation to
avoid misunderstandings
Have a good time,
jonas