Re: Current LKMM patch disposition

From: Andrea Parri
Date: Mon Feb 13 2023 - 06:16:03 EST


> > Would you like to post a few examples showing some of the most difficult
> > points you encountered? Maybe explanation.txt can be improved.
>
> Just to list 2 of the pain points:
>
> 1. I think it is hard to reason this section
> "PROPAGATION ORDER RELATION: cumul-fence"
>
> All store-related fences should affect propagation order, even the
> smp_wmb() which is not A-cumulative should do so (po-earlier stores
> appearing before po-later). I think expanding this section with some
> examples would make sense to understand what makes "cumul-fence"
> different from any other store-related fence.

FWIW, litmus-tests/WRC+pooncerelease+fencermbonceonce+Once.litmus illustrates
the concept of A-cumulativity. (The terminology is not LKMM-specific, it was
borrowed from other MCM literature, e.g. "Understanding POWER Multiprocessors"
in Documentation/references.txt.)


> 2. This part is confusing and has always confused me " The
> happens-before relation (hb) links memory accesses that have to
> execute in a certain order"
>
> It is not memory accesses that execute, it is instructions that
> execute. Can we separate out "memory access" from "instruction
> execution" in this description?
>
> I think ->hb tries to say that A ->hb B means, memory access A
> happened before memory access B exactly in its associated
> instruction's execution order (time order), but to be specific --
> should that be instruction issue order, or instruction retiring order?
>
> AFAICS ->hb maps instruction execution order to memory access order.
> Not all ->po does fall into that category because of out-of-order
> hardware execution. As does not ->co because the memory subsystem may
> have writes to the same variable to be resolved out of order. It would
> be nice to call out that ->po is instruction issue order, which is
> different from execution/retiring and that's why it cannot be ->hb.
>
> ->rf does because of data flow causality, ->ppo does because of
> program structure, so that makes sense to be ->hb.
>
> IMHO, ->rfi should as well, because it is embodying a flow of data, so
> that is a bit confusing. It would be great to clarify more perhaps
> with an example about why ->rfi cannot be ->hb, in the
> "happens-before" section.
>
> That's really how far I typically get (line 1368) before life takes
> over, and I have to go do other survival-related things. Then I
> restart the activity. Now that I started reading the CAT file as well,
> I feel I can make it past that line :D. But I never wanted to get past
> it, till I built a solid understanding of the contents before it.
>
> As I read the file more, I can give more feedback, but the above are
> different 2 that persist.

AFAICT, sections "The happens-before relation: hb" and "An operational model"
in Documentation/explanation.txt elaborate (should help) clarify such issues.
About the ->rfi example cf. e.g. Test PPOCA in the above mentioned paper; the
test remains allowed in arm64 and riscv.

Andrea