Re: [PATCH v2 2/2] nvme: handle persistent internal error AER from NVMe controller

From: hch@xxxxxx
Date: Mon Jun 06 2022 - 02:52:03 EST


On Sat, Jun 04, 2022 at 02:28:11PM +0000, Michael Kelley (LINUX) wrote:
> > driver's irq handler. The other transports block on register reads, though, so
> > they can't call this from an atomic context. The TCP context looks safe, but
> > I'm not sure about RDMA or FC.
>
> Good point. But even if the RDMA and FC contexts are safe,

For RDMA this is typically called from softirq context, so it is indeed
not save.

> if a
> persistent error is reported, the controller is already in trouble and
> may not respond to a request to retrieve the CSTS anyway. Perhaps
> we should just trust the AER error report and not bother checking
> CSTS to decide whether to do the reset. We can still check ctrl->state
> and skip the reset if there's already one in progress.

Yes, that might be a better option.