Re: [tip: core/rcu] rcu: Enable tick for nohz_full CPUs slow to provide expedited QS

From: Paul E. McKenney
Date: Sat Jan 25 2020 - 14:48:49 EST


On Sat, Jan 25, 2020 at 06:54:42PM +0100, Borislav Petkov wrote:
> On Sat, Jan 25, 2020 at 08:10:50AM -0800, Paul E. McKenney wrote:
> > How big? (Seriously, given that the fix may depend on the number of CPUs.)
>
> [ 7.660017] smp: Brought up 2 nodes, 256 CPUs
>
> > So the problem appears to be that some of the boot-time processing
> > is looping in the kernel, which is preventing the grace period from
> > completing. One could argue that such code should be fixed, but on the
> > other hand, boot time is a bit special. Later in -rcu's dev branch,
> > there are commits that forgive this boot-time misbehavior, but this is
> > a bit late in process to dump all of those commits into -tip.
>
> Aha.
>
> > The RT guys might need the warning, and it was them that I was thinking
> > of when adding it.
>
> But "boot time is a bit special". Or do they care about deadlines during
> boot too?

Maybe, but not that I know of. If they do, this would be an excellent
time for them to let me know!

My guess is "no" because the real-time application would not yet be
running during boot. On the other hand, if this issue is due not so much
to boot, but to (say) expensive filesystem operations on large systems,
that might be a different story.

Except that I would have hard questions to ask of someone doing expensive
filesystem operations while their deep-sub-millisecond real-time
application was running. So even then, I doubt that they would care.

Again, if I am wrong about this, this would be an excellent time for
them to let me know.

> > But let's see what works for mainline first. And
> > since your box was booting fine without the warning before, I bet that
> > it boots just fine with that warning removed.
>
> Yes, it does.

Woo-hoo!!!

> > So could you please try out the (untested) patch below?
>
> Warning's gone.

Very good. I will get it property prepared and tested, then send it
along to Ingo.

> > If that works, I will re-introduce the warning with proper protection
> > for the merge window following this coming one.
>
> My big box is at your service if you need stuff tested later.

Thank you in advance! I just might take you up on that!

In the meantime, one question... Are you testing for realtime suitability
on your big box? If so, to what extent?

> Thx Paul.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

Aside from habitually failing to trim emails, which of these was I
violating? ;-)

Thanx, Paul