Re: Some observations (results) on BPF acquire and release

From: Andrea Parri
Date: Fri Oct 25 2024 - 09:57:18 EST


> But the subset of the LKMM which deals with "strong fences" and Acq &
> Rel (limited to so called marked accesses) seems relatively contained
> /simple: its analysis could be useful, if not determining, in trying
> to resolve the above issues.

Elaborating on the previous suggestion/comparison with the LKMM, the
"subset" in question can take the following form (modulo my typos):

"LKMM with acquire/release, strong fences, and marked accesses only"

[...]

let acq-po = [Acq] ; po ; [M]
let po-rel = [M] ; po ; [Rel]
let strong-fence = [M] ; fencerel(Mb) ; [M]
let ppo = acq-po | po-rel | strong-fence

let A-cumul(r) = rfe? ; r
let cumul-fence = A-cumul(strong-fence | po-rel)
let overwrite = co | fr
let prop = (overwrite & ext)? ; cumul-fence* ; rfe?

let hb = ppo | rfe | ((prop \ id) & int)
acyclic hb as Hb

let pb = prop ; strong-fence ; hb*
acyclic pb as Pb


For BPF, we'd want to replace acq-po, po-rel and strong-fence with
load_acquire, store_release and po_amo_fetch respectively: Unless
I'm missing something, this should restore the intended behaviors
for the R and Z6.3 tests discussed earlier.

A couple of other remarks:

- Notice how the above formalization is completely symmetrical wrt.
co <-> fr, IOW, co links are considered "on par with" fr links.
In particular, the following test is satisfiable in the above
formalization, as is the corresponding C test in the LKMM:

BPF 2+2W+release+fence
{
0:r2=x; 0:r4=y;
1:r2=y; 1:r4=x; 1:r6=l;
}
P0 | P1 ;
r1 = 1 | r1 = 1 ;
*(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ;
r3 = 2 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
store_release((u32 *)(r4 + 0), r3) | r3 = 2 ;
| store_release((u32 *)(r4 + 0), r3) ;
exists ([x]=1 /\ [y]=1)

(On an historical note, this wasn't always the case in the LKMM,
cf. e.g. [1], but be alerted that the formalization in [1] is
decisively more involved and less intuitive than today's / what
the LKMM community has converged to. ;-) )

- The above formalization merges the so called "Observation" axiom
in the "Happens-before" axiom. In the LKMM, this followed the
removal of B-cumulativity for smp_wmb() and smp_store_release()
and a consequent "great simplification" of the hb relation: link
[2] can provide more details and some examples related to those
changes. For completeness, here is the BPF analogue of test
"C-release-acquire-is-B-cumulative.litmus" from that article:

BPF ISA2+release+acquire+acquire
{
0:r2=x; 0:r4=y;
1:r2=y; 1:r4=z;
2:r2=z; 2:r4=x;
}
P0 | P1 | P2 ;
r1 = 1 | r1 = load_acquire((u32 *)(r2 + 0)) | r1 = load_acquire((u32 *)(r2 + 0)) ;
*(u32 *)(r2 + 0) = r1 | r3 = 1 | r3 = *(u32 *)(r4 + 0) ;
r3 = 1 | *(u32 *)(r4 + 0) = r3 | ;
store_release((u32 *)(r4 + 0), r3) | | ;
exists (1:r1=1 /\ 2:r1=1 /\ 2:r3=0)

The formalization sketched above allows this behavior. Notice,
however, that the behavior is forbidden after "completion" of
the release/acquire chain, i.e. by making the store from P1 a
store-release (a property also known as A-cumulativy of the
release operation).


I guess the next question (once clarified the intentions for the R
and Z6.3 tests seen earlier) is "Does BPF really care about 2+2W
and B-cumulativity for store-release?"; I mentioned some tradeoff,
but in the end this is a call for the BPF community.

Andrea

[1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/StrongModel.html
[2] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/WeakModel.html#Cumulativity