Re: [RFC] arch hardlockup detector interfaces improvement
From: Nicholas Piggin
Date: Fri May 19 2017 - 18:54:31 EST
On Fri, 19 May 2017 15:43:45 -0400
Don Zickus <dzickus@xxxxxxxxxx> wrote:
> On Sat, May 20, 2017 at 12:53:06AM +1000, Nicholas Piggin wrote:
> > > I am curious to know what IBM thinks there. Currently the HARDLOCKUP
> > > detector sits on top of perf. I get the impression, you are removing that
> > > dependency. Is that a permanent thing or are you thinking of switching back
> > > and forth depending on if SOFTLOCKUP is enabled or not?
> >
> > We want to get away from perf permanently.
> >
> > The PMU interrupts are not specially non-maskable from a hardware
> > POV, everything gets masked when you turn off interrupts in hardware.
> > powerpc arch code implements a software disable layer, and PMU
> > interrupts are differentiated there by being allowed to run even
> > under local_irq_disable();
> >
> > We have a few issues with using perf for it. We disable it by
> > default because using it for that breaks another PMU feature.
> >
> > But PMU interupts are not special, so it would be possible to e.g.,
> > take the timer interrupt before soft disable and have it touch the
> > watchdog if it fires while under local_irq_disable(). That give
> > exact same kind of pseudo-NMI as perf interrupts, without using PMU.
> >
> > Further, we now want to introduce a local_pmu_disable() type of
> > interface that extends this soft disable layer to perf interrupts
> > as well for some cases. Once we start doing that, more code will
> > be exempt from the hardlockup watchdog, whereas a watchdog specific
> > hook from the timer interrupt would still cover it.
>
> Interesting. Thanks for that info.
>
> Do you think you can split the patch in half for me? You are shuffling code
> around and making subtle changes in the shuffled code. It is hard to double
> check everything. Namely watchdog_nmi_reconfigure and watchdog_update_cpus
> suddenly appear.
>
> The first patch would be straight code movement, the second the real
> changes. I almost don't care if you throw in the LOCKUP->SOFTLOCKUP changes
> in the first patch either.
>
> I am really interested in the subtle changes to make sure you covered the
> various race conditions that we have uncovered over the years.
Yes that makes sense, I'll do that.
> I also would like to see a sample of the watchdog_nmi_reconfigure() you had
> in mind. Usually we hide all the nmi stuff underneath the watchdog
> functions. But those are threaded, which is what you are trying to avoid,
> so the placement of the function makes sense but looks odd.
I'm just calling it after any sysctl changes or suspend/resume etc, and
the lockup detector I have basically just shuts down everything and
then re-starts with the new parameters (could be made much fancier but
I didn't see a need.
I will split out this patch and re-post to you with the powerpc patch in
the series that you can take a look at.
Thanks,
Nick