Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

From: Borislav Petkov
Date: Thu Apr 19 2018 - 12:45:52 EST


On Thu, Apr 19, 2018 at 11:26:57AM -0500, Alex G. wrote:
> At a very high level, I'm working with Dell on improving server
> reliability, with a focus on NVME hotplug and surprise removal. One of
> the features we don't support is surprise removal of NVME drives;
> hotplug is supported with 'prepare to remove'. This is one of the
> reasons NVME is not on feature parity with SAS and SATA.

Ok, first question: is surprise removal something purely mechanical or
do you need firmware support for it? In the sense that you need to tell
the firmware that you will be removing the drive.

I'm sceptical, though, as it has "surprise" in the name so I'm guessing
the firmware doesn't know about it, the drive physically disappears and
the FW starts spewing PCIe errors...

> I'm not sure if this is the example you're looking for, but
> take an r740xd server, and slowly unplug an Intel NVME drives at an
> angle. You're likely to crash the machine.

No no, that's actually a great example!

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.