Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline

From: Peter Zijlstra
Date: Thu Jun 28 2018 - 09:07:13 EST


On Thu, Jun 28, 2018 at 05:38:33AM -0700, Paul E. McKenney wrote:
> Please let me try again.
>
> The approach you are suggesting, clever though it is, disables a check

https://lkml.kernel.org/r/20180627094633.GG2512@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Is the one we're talking about, right?

That does not disable any actual check afaict. It simply does not do a
wakeup when ran on an offline CPU. And ensures we do an unconditional
wakeup soon after from a still running CPU.

> of a type that has proved to be an important diagnostic in the past.
> It is only reasonable to assume that this check would be important
> and helpful in the future, but only if that check remains in the code.

I am confused..

> Yes, agreed, given the current structure of the code, this particular
> instance of the check would not matter, but experience indicates that
> RCU code restructuring is not at all uncommon, with the current effort
> being but one case in point.

Once more confused...

> So, unless I am missing something, the only possible benefit of disabling
> this check is getting rid of an acquisition of an uncontended lock in
> a code path that is miles (sorry, kilometers) away from any fastpath.
> So, again, yes, it is clever. If it sped up a fastpath, I might be
> sorely tempted to take it. But the alternative is straightforward and
> isn't anywhere near a fastpath. So, though I do very much appreciate
> the cleverness and creativity, I am not seeing your change to be a
> good tradeoff from a long-term maintainability viewpoint.

I think you mean guarantee/invariant instead of check. But I see it no
different than any other missed rcu_gp_kthread_wake(). You can similarly
fail to make the call while restructuring.