Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model
From: Paul E. McKenney
Date: Mon Jan 29 2024 - 08:50:12 EST
On Sat, Jan 27, 2024 at 11:54:46PM +0100, Frederic Weisbecker wrote:
> Le Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen a écrit :
> > +static bool tmigr_inactive_up(struct tmigr_group *group,
> > + struct tmigr_group *child,
> > + void *ptr)
> > +{
> > + union tmigr_state curstate, newstate, childstate;
> > + struct tmigr_walk *data = ptr;
> > + bool walk_done;
> > + u8 childmask;
> > +
> > + childmask = data->childmask;
> > + curstate.state = atomic_read(&group->migr_state);
> > + childstate.state = 0;
> > +
> > + do {
>
> So I got the confirmation from Boqun (+Cc) and Paul that a failing cmpxchg
> may not order the load of the old value against subsequent loads. And
> that may apply to atomic_try_cmpxchg() as well.
Plus we checked with Mark Rutland, who agreed and who went further kindly
submitted a patch to clarify the documentation:
https://lore.kernel.org/lkml/20240129122250.1086874-1-mark.rutland@xxxxxxx/
Here is an LKMM litmus test demonstrating that failing cmpxchg() does not
provide ordering:
------------------------------------------------------------------------
C cmpxchg-fail-order
{}
P0(int *x, int *y, int *z)
{
int r0;
int r1;
WRITE_ONCE(*x, 1);
r1 = cmpxchg(z, 1, 0);
r0 = READ_ONCE(*y);
}
P1(int *x, int *y, int *z)
{
int r0;
int r1;
WRITE_ONCE(*y, 1);
r1 = cmpxchg(z, 1, 0);
r0 = READ_ONCE(*x);
}
locations[0:r1;1:r1]
exists (0:r0=0 /\ 1:r0=0)
------------------------------------------------------------------------
Yes, this is cmpxchg() rather than atomic_cmpxchg(), but both have the
same ordering properties.
Here P0() is one thread and P1() the other. Their parameters are the
global shared variables. The "{}" preceding P0() is where initialization
could be placed, but this test is fine with the default initialization
of zero, for example, given that both instances of cmpxchg() specify
the value 1 as the old value (and thus both instances will fail).
The "locations" clause says to print the final values of P0()'s and P1()'s
local variables r1, where the leading "0:" selects P0() and the leading
"1:" selects P1(). And yes, the first process must be named "P0" and
subsequent processes' names must consist of "P" followed by a consecutive
number, so P0(), P1(), P2(), P3(), ...
The "exists" clause gives an expression to be tested "at the end of time".
Again, the "0:" and "1:" select P0() and P1(), respectively. The "="
tests for equality. The "/\" is boolean AND. Boolean OR would be
"\/" and boolean NOT "~". Parentheses may be used. A variable name on
the left-hand side of a relational operator denotes the value of that
variable, but on the right-hand side its address.
The value of any variable mentioned in the "exists" clause is printed as
part of the "States" list shown below.
Running this with the herd7 tool does the moral equivalent of a full
state-space search, and gets the following:
------------------------------------------------------------------------
$ herd7 -conf linux-kernel.cfg /tmp/cmpxchg.litmus
Test cmpxchg-fail-order Allowed
States 4
0:r0=0; 0:r1=0; 1:r0=0; 1:r1=0;
0:r0=0; 0:r1=0; 1:r0=1; 1:r1=0;
0:r0=1; 0:r1=0; 1:r0=0; 1:r1=0;
0:r0=1; 0:r1=0; 1:r0=1; 1:r1=0;
Ok
Witnesses
Positive: 1 Negative: 3
Condition exists (0:r0=0 /\ 1:r0=0)
Observation cmpxchg-fail-order Sometimes 1 3
Time cmpxchg-fail-order 0.00
Hash=564afea251867c6127350213c7eb388d
------------------------------------------------------------------------
Here, there are four possible combinations of final values for the two
r0 local variables, and all of them can happen, including the combination
where both are zero, which is the combination called out by the "exists"
clause. The "Sometimes" says that the expression in the "exists" clause
is sometimes true (other options being "Always" and "Never").
The fact that both instances of r0 can be zero shows that failing
cmpxchg() really does not provide ordering. And again, ordering for
atomic_try_cmpxchg() is the same as that for cmpxchg().
Thanx, Paul
> Therefore you not only need to turn group->migr_state read into
> an atomic_read_acquire() but you also need to do this on each iteration
> of this loop. For example you can move the read_acquire right here.
>
> Thanks.
>
> > + if (child)
> > + childstate.state = atomic_read(&child->migr_state);
> > +
> > + newstate = curstate;
> > + walk_done = true;
> > +
> > + /* Reset active bit when the child is no longer active */
> > + if (!childstate.active)
> > + newstate.active &= ~childmask;
> > +
> > + if (newstate.migrator == childmask) {
> > + /*
> > + * Find a new migrator for the group, because the child
> > + * group is idle!
> > + */
> > + if (!childstate.active) {
> > + unsigned long new_migr_bit, active = newstate.active;
> > +
> > + new_migr_bit = find_first_bit(&active, BIT_CNT);
> > +
> > + if (new_migr_bit != BIT_CNT) {
> > + newstate.migrator = BIT(new_migr_bit);
> > + } else {
> > + newstate.migrator = TMIGR_NONE;
> > +
> > + /* Changes need to be propagated */
> > + walk_done = false;
> > + }
> > + }
> > + }
> > +
> > + newstate.seq++;
> > +
> > + WARN_ON_ONCE((newstate.migrator != TMIGR_NONE) && !(newstate.active));
> > +
> > + } while (!atomic_try_cmpxchg(&group->migr_state, &curstate.state, newstate.state));