Re: [PATCH rc v7 0/7] iommu/arm-smmu-v3: Fix device crash on kdump kernel
From: Jason Gunthorpe
Date: Tue Jun 30 2026 - 15:08:30 EST
On Tue, Jun 30, 2026 at 06:30:41PM +0000, Pranjal Shrivastava wrote:
> > As I mentioned above in the previous
> > reply I am not sure I understand what situation leads into this, when
> > does a device trigger SError to the system vs when not which is observed
> > as an event in that case.
>
> Ack. I see what you mean now.. How does a DMA fault raise an SError?
As I gave an example to Robin if the unhandled failure escalates into
RAS emergency unplugging CXL memory then the system is going to
explode when kdump touches that CXL memory as part of the dumping. It
is not quite so simple that a DMA abort is triggering SError.
I don't know exactly the sequence of events that lead up to the kdump
kernel crashing (I imagine it is hard to debug that one), but it is
something related to the new kernel not participating in the RAS and
the RAS flow escalating to something fatal.
Jason