Re: [PATCH v4 0/4] arm64: cross-CPU NMI via SDEI

From: Kiryl Shutsemau

Date: Fri Jun 26 2026 - 15:41:42 EST


On Fri, Jun 26, 2026 at 06:07:10PM +0100, Catalin Marinas wrote:
> On Wed, Jun 17, 2026 at 08:20:01PM +0100, Kiryl Shutsemau wrote:
> > - A CPU stopped by the SDEI rung is parked, not powered off via PSCI
> > CPU_OFF. Reaching and dumping the wedged CPU -- the point of the
> > series -- works, and it matches the shared stop path's own park
> > fallback when CPU_OFF is unavailable. The consequence is that an SMP
> > crash-capture kernel cannot re-online such a CPU (it stays "already
> > on"); the capture kernel boots and runs on the remaining CPUs.
> > Powering the stopped CPU off so a capture kernel can reclaim it
> > requires completing the SDEI event and then CPU_OFF, which hit a
> > firmware-specific issue still under investigation; it is left as a
> > follow-up and does not affect the dump's contents.
>
> Just to understand, your firmware cannot cope with a PSCI CPU_OFF from
> the SDEI handler? This is one of the required calls to be supported.

I did chase it a fair bit. Bisecting on Grace: completing the event and
parking (no CPU_OFF) works, and so does the stack-switch + C-call setup
on its own. The hang only appears once I call PSCI CPU_OFF after the
event -- and it persists even with DAIF masked and the GIC PMR reset
first, so it isn't leftover interrupt/priority state from the dispatch.
It's a silent wedge: no TF-A exception report, nothing after the
last console line.

But I have not tried calling CPU_OFF directly, without completing the
event. I assumed it is required. Will give it a try when I have time.

Either way this is a side quest: it only lets a crash kernel reclaim the
stopped CPU. The dump itself is complete, so it's nice-to-have, not
required for the series.

--
Kiryl Shutsemau / Kirill A. Shutemov