Re: [PATCH] nvme-pci: ignore bogus CRTO according to NVME 2.0 spec

From: Keith Busch
Date: Fri Sep 08 2023 - 12:51:39 EST


On Fri, Sep 08, 2023 at 06:54:42PM +0300, Felix Yan wrote:
> NVME 2.0 spec section 3.1.3 suggests that "Software should not rely on
> 0h being returned". Here we should safeguard timeout reads when CRTO is 0 and
> fallback to the old NVME 1.4 compatible field.

Not sure I follow what you're saying here. We're not really relying on
CRTO being 0. It was a non-zero capability bit that told the driver to
use CRTO, and 0 is potentially a valid value a controller could report.

> Fixes 4TB SSD initialization issues with MAXIO MAP1602 controller, including
> Lexar NM790, AIGO P7000Z, Fanxiang S790, Acer Predator GM7, etc.

This patch makes more sense, thanks for getting to the bottom of it.

So the device reports CRWMS capability. The host is supposed to use the
CRTO.CRWMT in that case, and 0 could be legit. But spec also says CAP.TO
must match CTRO.CRWMT if it's less than 0xff. This obviously doesn't, so
your patch looks like a reasonable fallback to me. Maybe always just set
timeout to the bigger of the two values since CRWMT isn't reliable if
it's ever smaller than CAP.TO.

timeout = max(NVME_CRTO_CRWMT(crto), NVME_CAP_TIMEOUT(ctrl->cap));

I'll add the Cc: stable when apply so they are sure to pick this up.
I'll just wait for next Monday to apply in case there any other reviewer
comments.

> Signed-off-by: Felix Yan <felixonmars@xxxxxxxxxxxxx>
> ---
> drivers/nvme/host/core.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index f3a01b79148c..8ec28b1016ca 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2255,11 +2255,17 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
> return ret;
> }
>
> - if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
> - ctrl->ctrl_config |= NVME_CC_CRIME;
> - timeout = NVME_CRTO_CRIMT(crto);
> + if (crto == 0) {
> + timeout = NVME_CAP_TIMEOUT(ctrl->cap);
> + dev_warn(ctrl->device, "Ignoring bogus CRTO (0), falling back to NVME_CAP_TIMEOUT (%u)\n",
> + timeout);
> } else {
> - timeout = NVME_CRTO_CRWMT(crto);
> + if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
> + ctrl->ctrl_config |= NVME_CC_CRIME;
> + timeout = NVME_CRTO_CRIMT(crto);
> + } else {
> + timeout = NVME_CRTO_CRWMT(crto);
> + }
> }
> } else {
> timeout = NVME_CAP_TIMEOUT(ctrl->cap);
> --
> 2.42.0
>