Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po

From: Jonas Oberhauser
Date: Sun Jan 29 2023 - 17:20:18 EST

Next message: Jens Axboe: "Re: [PATCH -next v3 0/3] blk-cgroup: make sure pd_free_fn() is called in order"
Previous message: Linus Torvalds: "Linux 6.2-rc6"
In reply to: Alan Stern: "Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po"
Next in thread: Alan Stern: "Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi all, apologies on the confusion about the litmus test.
I should have explained it better but it seems you mostly figured it out.
As Alan said I'm tricking a little bit by not unlocking in certain places to filter out all executions that aren't what I'm looking for.
I didn't have much time when I sent it (hence also the lack of explanation and why I haven't responded earlier), so I didn't have time to play around with the filter feature to do this the "proper"/non-cute way.
As such it really isn't about deadlocks.

I think one question is whether the distinction between the models could be reproduced without using any kind of filtering at all.
I have a feeling it should be possible but I haven't had time to think up a litmus test that does that.

On 1/28/2023 11:59 PM, Alan Stern wrote:

On Sat, Jan 28, 2023 at 11:14:17PM +0100, Andrea Parri wrote:

Evidently the plain-coherence check rules out x=1 at the
end, because when I relax that check, x=1 becomes a possible result.
Furthermore, the graphical output confirms that this execution has a
ww-incoh edge from Wx=2 to Wx=1. But there is no ww-vis edge from Wx=1
to Wx=2! How can this be possible? It seems like a bug in herd7.

By default, herd7 performs some edges removal when generating the
graphical outputs. The option -showraw can be useful to increase
the "verbosity", for example,

[with "exists (x=2)", output in /tmp/T.dot]
$ herd7 -conf linux-kernel.cfg T.litmus -show prop -o /tmp -skipchecks plain-coherence -doshow ww-vis -showraw ww-vis

Okay, thanks, that helps a lot.

So here's what we've got. The litmus test:

C hb-and-int
{}

P0(int *x, int *y)
{
*x = 1;
smp_store_release(y, 1);
}

P1(int *x, int *y, int *dx, int *dy, spinlock_t *l)
{
spin_lock(l);
int r1 = READ_ONCE(*dy);
if (r1==1)
spin_unlock(l);

int r0 = smp_load_acquire(y);
if (r0 == 1) {
WRITE_ONCE(*dx,1);
}
}

P2(int *dx, int *dy)
{
WRITE_ONCE(*dy,READ_ONCE(*dx));
}

P3(int *x, spinlock_t *l)
{
spin_lock(l);
smp_mb__after_unlock_lock();
*x = 2;
}

exists (x=2)

The reason why Wx=1 ->ww-vis Wx=2:

0:Wx=1 ->po-rel 0:Wy=1 and po-rel < fence < ww-post-bounded.

0:Wy=1 ->rfe 1:Ry=1 ->(hb* & int) 1:Rdy=1 and
(rfe ; hb* & int) <= (rfe ; xbstar & int) <= vis.

1:Rdy=1 ->po 1:unlock ->rfe 3:lock ->po 3:Wx=2
so 1:Rdy=1 ->po-unlock-lock-po 3:Wx=2
and po-unlock-lock-po <= mb <= fence <= w-pre-bounded.

[...]

This explains why the memory model says there isn't a data race. This
doesn't use the smp_mb__after_unlock_lock at all.

Note as Andrea said that po-unlock-lock-po is generally not in mb (and also not otherwise in fence).
Only po-unlock-lock-po;[After-unlock-lock];po is in mb (resp. ...&int in fence).
While the example uses smp_mb__after_unlock_lock, the anomaly should generally cover any extended fences.

[Snipping in a part of an earlier e-mail you sent]

I think that herd7_should_ say there is a data race.

On the other hand, I also think that the operational model says there
isn't. This is a case where the formal model fails to match the
operational model.

The operational model says that if W is a release-store on CPU C and W'
is another store which propagates to C before W executes, then W'
propagates to every CPU before W does. (In other words, releases are
A-cumulative). But the formal model enforces this rule only in cases
the event reading W' on C is po-before W.

In your litmus test, the event reading y=1 on P1 is po-after the
spin_unlock (which is a release). Nevertheless, any feasible execution
requires that P1 must execute Ry=1 before the unlock. So the
operational model says that y=1 must propagate to P3 before the unlock
does, i.e., before P3 executes the spin_lock(). But the formal model
doesn't require this.

I see now. Somehow I thought stores must execute in program order, but I guess it doesn't make sense.
In that sense, W ->xbstar&int X always means W propagates to X's CPU before X executes.

Ideally we would fix this by changing the definition of po-rel to:

[M] ; (xbstar & int) ; [Release]

(This is closely related to the use of (xbstar & int) in the definition
of vis that you asked about.)

This misses the property of release stores that any po-earlier store must also execute before the release store.
Perhaps it could be changed to the old po-rel | [M] ; (xbstar & int) ; [Release] but then one could instead move this into the definition of cumul-fence.
In fact you'd probably want this for all the propagation fences, so cumul-fence and pb should be the right place.

Unfortunately we can't do this, because
po-rel has to be defined long before xbstar.

You could do it, by turning the relation into one massive recursive definition.

Thinking about what the options are:
1) accept the difference and run with it by making it consistent inside the axiomatic model
2) fix it through the recursive definition, which seems to be quite ugly but also consistent with the power operational model as far as I can tell
3) weaken the operational model... somehow
4) just ignore the anomaly
5) ???

Currently my least favorite option is 4) since it seems a bit off that the reasoning applies in one specific case of LKMM, more specifically the data race definition which should be equivalent to "the order of the two races isn't fixed", but here the order isn't fixed but it's a data race.
I think the patch happens to almost do 1) because the xbstar&int at the end should already imply ordering through the prop&int <= hb rule.
What would remain is to also exclude rcu-fence somehow.

2) seems the most correct choice but also to complicate LKMM a lot.

Seems like being between a rock and hard place.
jonas

PS:

The other cases of *-pre-bounded seem to work out: they all link stuff via
xbstar to the instruction that is linked via po-unlock-lock-po ;
[After-unlock-lock] ; po to the potentially racy access, and
po-unlock-lock-po;po is xbstar ; acq-po, which allows closing the gap.

I could not follow your arguments at all; the writing was too confusing.

Sorry, I collapsed some cases. I'll show an example. I think all the other cases are the same.
Let's pick an edge that links two events with ww-vis through
w-post-bounded ; vis ; w-pre-bounded
where the vis comes from
cumul-fence* ; rfe? ; [Marked] ;
    (strong-fence ; [Marked] ; xbstar) <= vis
and the w-pre-bounded came from po-unlock-lock-po;[After-unlock-lock];po but not po-unlock-lock-po & int.

Note that such po-unlock-lock-po;[After-unlock-lock];po must come from
po-rel ; rfe ; acq-po

So we have
w-post-bounded ; cumul-fence* ; rfe? ; [Marked] ;
     strong-fence ; [Marked] ; xbstar ; po-rel ; rfe ; acq-po
<=
w-post-bounded ; cumul-fence* ; rfe? ; [Marked] ;
     strong-fence ; [Marked] ; xbstar ; hb ; hb ; acq-po
<=
w-post-bounded ; cumul-fence* ; rfe? ; [Marked] ;
     strong-fence ; [Marked] ; xbstar ;                fence
<= ww-vis

All the other cases where w-pre-bounded are used have the shape
    ... ; xbstar ; w-pre-bounded
which can all be handled in the same manner.

Next message: Jens Axboe: "Re: [PATCH -next v3 0/3] blk-cgroup: make sure pd_free_fn() is called in order"
Previous message: Linus Torvalds: "Linux 6.2-rc6"
In reply to: Alan Stern: "Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po"
Next in thread: Alan Stern: "Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]