Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

From: Jonas Oberhauser
Date: Fri Jan 20 2023 - 06:14:07 EST




On 1/19/2023 9:06 PM, Alan Stern wrote:
On Thu, Jan 19, 2023 at 04:05:38PM +0100, Jonas Oberhauser wrote:

On 1/19/2023 4:13 AM, Alan Stern wrote:
On Wed, Jan 18, 2023 at 10:38:11PM +0100, Jonas Oberhauser wrote:
On 1/18/2023 8:52 PM, Alan Stern wrote:
On Tue, Jan 17, 2023 at 08:31:59PM +0100, Jonas Oberhauser wrote:
- ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
- ([M] ; po ; [UL] ; (co | po) ; [LKW] ;
- fencerel(After-unlock-lock) ; [M])
+ ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M])
Shouldn't the po case of (co | po) remain intact here?
You can leave it here, but it is already covered by two other parts: the
ordering given through ppo/hb is covered by the po-unlock-lock-po & int in
ppo, and the ordering given through pb is covered by its inclusion in
strong-order.
What about the ordering given through
A-cumul(strong-fence)/cumul-fence/prop/hb? I suppose that might be
superseded by pb as well,
Indeed, in fact all of A-cumul(strong-fence) is already covered through pb.
but it seems odd not to have it in hb.
In general, the idea in the memory model is that hb ordering relies on
the non-strong properties of fences, whereas pb relies on the properties
of strong fences, and rb relies on the properties of the RCU fences.
I agree in the sense that all strong-ordering operations are A-cumulative
and not including them in A-cumul is weird.

On the other side  the model becomes a tiny bit smaller and simpler when all
ordering of prop;strong-ordering goes through a single place (pb).

If you want, you could think of the A-cumulativity of strong ordering
operations as being a consequence of their strong properties. Mathematically
That's backward logic. Being strong isn't the reason the fences are
A-cumulative. Indeed, the model could have been written not to assume
that strong fences are A-cumulative.

It's not just the model, it's also how strong fences are introduced in the documentation.
I agree though that you could decouple the notion of strong from A-cumulativity.
But would anyone want a strong fence that is not A-cumulative?
It's a bit like asking in C11 for a barrier that has the sequential consistency guarantee of appearing in the global order S, but doesn't have release or acquire semantics. Yes you could define that, but would it really make sense?


@@ -91,8 +89,12 @@ acyclic hb as happens-before
(* Write and fence propagation ordering *)
(****************************************)
-(* Propagation: Each non-rf link needs a strong fence. *)
-let pb = prop ; strong-fence ; hb* ; [Marked]
+(* Strong ordering operations *)
+let strong-order = strong-fence | ([M] ; po-unlock-lock-po ;
+ [After-unlock-lock] ; po ; [M])
This is not the same as the definition removed above. In particular,
po-unlock-lock-po only allows one step along the locking chain -- it has
rf where the definition above has co.
Indeed it is not, but the subsequent lock-unlock sequences are in hb*. For
this reason it can be simplified to just consider the directly following
unlock().
I'm not sure that argument is right. The smp_mb__after_unlock_lock
needs to go after the _last_ lock in the sequence, not after the first.
So you don't have to worry about subsequent lock-unlock sequences; you
have to worry about preceding lock-unlock sequences.
I formalized a proof of equivalence in Coq a few months ago, but I was
recalling the argument incorrectly from memory.
I think the correct argument is that the previous po-unlock-lock-po form a
cumul-fence*;rfe;po sequence that starts with a po-rel.
so any
    prop; .... ; co ; ... ; this fence ;...
can be rewritten to
    prop; cumul_fence* ; po-unlock-lock-po ; this fence ;...
and because the the first cumul-fence here needs to start with po-release,
the prop ;cumul-fence* can be merged into a larger prop, leaving
    prop; po-unlock-lock-po ; this fence ;...
This may be so, but I would like to point out that the memory model was
not particularly designed to be as short or as simple as possible, but
more to reflect transparently the intuitions that kernel programmers
have about the ways ordering should work in the kernel. It may very
well include redundancies as a result. I don't think that's a bad
point.

I agree that sometimes redundancies have value. But I think having, where possible, a kind of minimal responsibility principle where each fence appears in as few relations as possible also has value.
I've seen that when I try to help people in my team learn to use LKMM: they see a specific type of fence and get stuck for a while chasing one of its uses. For example, they may chase a long prop link using the only strong fence in the example somewhere in the middle, which will then later turn out to be a dead-end because what they need to use is pb.

For someone who doesn't know that we never need to rely on that use to get ordering, it basically doubles the amount of time spent looking at the graph and chasing definitions. And for very good reasons there already are alot of definitions even when redundancies are eliminated.

Perhaps I would say that including these redundancies is good for understanding why the formalization makes sense, but excluding them is better for actually using the formalization.
This includes both when looking at code while having a printout of the model next to you, but also when trying to reason about LKMM itself (e.g., what one might do when changes are made to LKMM and one wants to check that they interact well with the existing parts of LKMM).

I think in the long term, increasing the usability is more important than the obvious correctness. But maybe I'm biased because I'm mostly on the user side of LKMM :D