Re: [PATCH rc v7 0/7] iommu/arm-smmu-v3: Fix device crash on kdump kernel

From: Nicolin Chen

Date: Tue Jun 30 2026 - 15:25:36 EST


On Tue, Jun 30, 2026 at 04:08:19PM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 30, 2026 at 06:30:41PM +0000, Pranjal Shrivastava wrote:
> > > As I mentioned above in the previous
> > > reply I am not sure I understand what situation leads into this, when
> > > does a device trigger SError to the system vs when not which is observed
> > > as an event in that case.
> >
> > Ack. I see what you mean now.. How does a DMA fault raise an SError?
>
> As I gave an example to Robin if the unhandled failure escalates into
> RAS emergency unplugging CXL memory then the system is going to
> explode when kdump touches that CXL memory as part of the dumping. It
> is not quite so simple that a DMA abort is triggering SError.

Here is link to that email:
https://lore.kernel.org/all/20260416172005.GB761338@xxxxxxxxxx/

> I don't know exactly the sequence of events that lead up to the kdump
> kernel crashing (I imagine it is hard to debug that one), but it is
> something related to the new kernel not participating in the RAS and
> the RAS flow escalating to something fatal.

Here is the original bug report:
- kernel boots into a crash kernel
- crash kernel hits OOM do to insufficient reserved memory and
panics
- PCIe errors are observed during this failure flow

Thanks
Nicolin