Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the timer
From: Peter Zijlstra
Date: Wed Jun 10 2015 - 18:38:00 EST
On Tue, Jun 09, 2015 at 11:33:18PM +0200, Oleg Nesterov wrote:
> And. Note that we can rewrite these 2 "write" critical sections in
> __run_hrtimer() and enqueue_hrtimer() as
>
> cpu_base->running = timer;
>
> write_seqcount_begin(cpu_base->seq);
> write_seqcount_end(cpu_base->seq);
>
> __remove_hrtimer(timer);
>
> and
>
> timer->state |= HRTIMER_STATE_ENQUEUED;
>
> write_seqcount_begin(cpu_base->seq);
> write_seqcount_end(cpu_base->seq);
>
> base->running = NULL;
>
> So we can probably use write_seqcount_barrier() except I am not sure
> about the 2nd wmb...
Which second wmb?
In any case, you use that transform from your reply to Kirill, and I
cannot currently see a hole in that. Lets call this transformation A. It
gets us the quoted bit above.
Now the above is:
seq++;
smp_wmb();
smp_wmb();
seq++;
Now, double barriers are pointless, so I think we can all agree that the
above is identical to the below. Lets call this tranformation B.
seq++;
smp_wmb();
seq++;
And then because you use the traditional seqcount read side, which
stalls when seq&1, we can transform the above into this. Transformation
C.
smp_wmb();
seq += 2;
Which is write_seqcount_barrier(), as you say above.
And since there are no odd numbers possible in that scheme, its
identical to my modified read side with the single increment. Transform
D.
The only difference at this point is that I have my seq increment on the
'wrong' side on the first state.
cpu_base->running = timer;
seq++;
smp_wmb();
timer->state = 0;
...
timer->state = 1;
smp_wmb();
seq++;
cpu_base->running = NULL;
Which, per my previous mail provides the following:
[S] seq++
[R] seq
RMB
[R] ->running (== NULL)
[S] ->running = timer;
WMB
[S] ->state = INACTIVE
[R] ->state (== INACTIVE)
RMB
[R] seq (== seq)
Which is where we had to modify the read side to do:
[R] ->state
RMB
[R] ->running
Now, if we use write_seqcount_barrier() that would become:
__run_hrtimer() hrtimer_active()
[S] ->running = timer; [R] seq
WMB RMB
[S] seq += 2; [R] ->running
[S] ->state = 0; [R] ->state
RMB
[R] seq
Which we can reorder like:
[R] seq
RMB
[R] ->running (== NULL)
[S] ->running = timer
WMB
[S] ->state = 0
[R] ->state (== 0)
RMB
[R] seq (== seq)
[S] seq += 2
Which still gives us that false negative and would still require the
read side to be modified to do:
[R] ->state
RMB
[R] ->running
IOW, one of our transforms (A-D) is faulty for it requires a
modification to the read side.
I suspect its T-C, where we loose the odd count that holds up the read
side.
Because the moment we go from:
Y = true;
seq++
WMB
seq++
X = false;
to:
Y = true;
WMB
seq += 2;
X = false;
It becomes possible to re-order like:
Y = true;
WMB
X = false
seq += 2;
And we loose our read order; or rather, where previously we ordered the
read side by seq, the seq increments are no longer ordered.
With this I think we can prove my code correct, however it also suggests
that:
cpu_base->running = timer;
seq++;
smp_wmb();
seq++;
timer->state = 0;
...
timer->state = 1;
seq++;
smp_wmb();
seq++;
cpu_base->running = NULL;
vs
hrtimer_active(timer)
{
do {
base = READ_ONCE(timer->base->cpu_base);
seq = read_seqcount_begin(&cpu_base->seq);
if (timer->state & ENQUEUED ||
base->running == timer)
return true;
} while (read_seqcount_retry(&cpu_base->seq, seq) ||
base != READ_ONCE(timer->base->cpu_base));
return false;
}
Is the all-round cheapest solution. Those extra seq increments are
almost free on all archs as the cacheline will be hot and modified on
the local cpu.
Only under the very rare condition of a concurrent hrtimer_active() call
will that seq line be pulled into shared state.
I shall go sleep now, and update my patch tomorrow, lets see if I will
still agree with myself after a sleep :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/