Re: [PATCH v9 0/7] arm64: Add debug IPI for backtraces / kgdb; try to use NMI for it

From: Mark Rutland
Date: Mon Aug 07 2023 - 06:41:37 EST


Hi Doug,

Apologies for the delay.

On Mon, Jul 24, 2023 at 08:55:44AM -0700, Doug Anderson wrote:
> On Thu, Jun 1, 2023 at 2:37 PM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
> I'm looking for some ideas on what to do to move this patch series
> forward. Thanks to Daniel, the kgdb patch is now in Linus's tree which
> hopefully makes this simpler to land. I guess there is still the
> irqchip dependency that will need to be sorted out, though...
>
> Even if folks aren't in agreement about whether this is ready to be
> enabled in production, I don't think anything here is super
> objectionable or controversial, is it? Can we land it? If you feel
> like it needs extra review, would it help if I tried to drum up some
> extra people to provide review feedback?

Ignoring the soundness issues I mentioned before (which I'm slowly chipping
away at, and you're likely lucky enough to avoid in practice)...

Having looked over the series, I think the GICv3 bit isn't quite right, but is
easy enough to fix. I've commented on the patch with what I think we should
have there.

The only major thing otherwise from my PoV is the structure of the debug IPI
framework. I'm not keen on that being a separate body of code and I think it
should live in smp.c along with the other IPIs. I'd also strongly prefer if we
could have separate IPI_CPU_BACKTRACE and IPI_CPU_KGDB IPIs, and I think we can
do that either by unifying IPI_CPU_STOP && IPI_CPU_CRASH_STOP or by reclaiming
IPI_WAKEUP by reusing a different IPI for the parking protocol (e.g.
IPI_RESCHEDULE).

I think it'd be nice if the series could enable NMIs for backtrace and the
CPU_{,CRASH_}STOP cases, with KGDB being the bonus atop. That way it'd be
clearly beneficial for anyone trying to debug lockups even if they're not a
KGDB user.

> Also: in case it's interesting to anyone, I've been doing benchmarks
> on sc7180-trogdor devices in preparation for enabling this. On that
> platform, I did manage to see about 4% reduction in a set of hackbench
> numbers when fully enabling pseudo-NMI. However, when I instead ran
> Speedometer 2.1 I saw no difference. See:
>
> https://issuetracker.google.com/issues/197061987

Thanks for the pointer!

I know that there are a couple of things that we could do to slightly improve
local_irq_*() when using pNMIs, though I suspect that the bulk of the cost
there will come from the necessary synchronization.

Thanks,
Mark.