Re: [PATCH] nvme-pci: ignore bogus CRTO according to NVME 2.0 spec

From: Keith Busch
Date: Mon Sep 11 2023 - 21:57:13 EST


On Fri, Sep 08, 2023 at 06:54:42PM +0300, Felix Yan wrote:
> NVME 2.0 spec section 3.1.3 suggests that "Software should not rely on
> 0h being returned". Here we should safeguard timeout reads when CRTO is 0 and
> fallback to the old NVME 1.4 compatible field.
>
> Fixes 4TB SSD initialization issues with MAXIO MAP1602 controller, including
> Lexar NM790, AIGO P7000Z, Fanxiang S790, Acer Predator GM7, etc.
>
> ----------
> nvme nvme1: Device not ready; aborting initialisation, CSTS=0x0
> ----------
>
> Signed-off-by: Felix Yan <felixonmars@xxxxxxxxxxxxx>
> ---
> drivers/nvme/host/core.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index f3a01b79148c..8ec28b1016ca 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2255,11 +2255,17 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
> return ret;
> }
>
> - if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
> - ctrl->ctrl_config |= NVME_CC_CRIME;
> - timeout = NVME_CRTO_CRIMT(crto);
> + if (crto == 0) {
> + timeout = NVME_CAP_TIMEOUT(ctrl->cap);
> + dev_warn(ctrl->device, "Ignoring bogus CRTO (0), falling back to NVME_CAP_TIMEOUT (%u)\n",
> + timeout);
> } else {
> - timeout = NVME_CRTO_CRWMT(crto);
> + if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
> + ctrl->ctrl_config |= NVME_CC_CRIME;
> + timeout = NVME_CRTO_CRIMT(crto);
> + } else {
> + timeout = NVME_CRTO_CRWMT(crto);
> + }
> }
> } else {
> timeout = NVME_CAP_TIMEOUT(ctrl->cap);

What do you think about this change instead? We don't need to print a
warning on every device reset, but we should probably add a comment
explaining why this is happening.

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37b6fa7466620..b4577a860e677 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2245,6 +2245,7 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
else
ctrl->ctrl_config = NVME_CC_CSS_NVM;

+ timeout = NVME_CAP_TIMEOUT(ctrl->cap);
if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
u32 crto;

@@ -2257,12 +2258,15 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)

if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
ctrl->ctrl_config |= NVME_CC_CRIME;
- timeout = NVME_CRTO_CRIMT(crto);
+ /*
+ * CRIMT should always be greater or equal to CAP.TO,
+ * but some devices are known to get this wrong. Use
+ * the larger of the two values.
+ */
+ timeout = max(timeout, NVME_CRTO_CRIMT(crto));
} else {
timeout = NVME_CRTO_CRWMT(crto);
}
- } else {
- timeout = NVME_CAP_TIMEOUT(ctrl->cap);
}

ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT;
--