Re: [PATCH v2 08/14] nvme: Implement cross-controller reset recovery

Next message: Dave Hansen: "Re: [RFC PATCH v5 16/45] x86/virt/tdx: Add tdx_alloc/free_control_page() helpers"
Previous message: T.J. Mercier: "Re: [PATCH 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion"
In reply to: Mohamed Khalfella: "Re: [PATCH v2 08/14] nvme: Implement cross-controller reset recovery"
Next in thread: Mohamed Khalfella: "Re: [PATCH v2 08/14] nvme: Implement cross-controller reset recovery"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: James Smart

Date: Tue Feb 10 2026 - 17:49:26 EST

On 2/10/2026 2:27 PM, Mohamed Khalfella wrote:

On Tue 2026-02-10 14:09:27 -0800, James Smart wrote:

On 1/30/2026 2:34 PM, Mohamed Khalfella wrote:
...

+unsigned long nvme_fence_ctrl(struct nvme_ctrl *ictrl)
+{
+ unsigned long deadline, now, timeout;
+ struct nvme_ctrl *sctrl;
+ u32 min_cntlid = 0;
+ int ret;
+
+ timeout = nvme_fence_timeout_ms(ictrl);
+ dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout);
+
+ now = jiffies;
+ deadline = now + msecs_to_jiffies(timeout);
+ while (time_before(now, deadline)) {

Q: don't we have something to identify the controller's subsystem
supports CCR before we starting selecting controllers and sending CCR ?

I would think on older devices that don't support it we should be
skipping this loop. The loop could delay the Time-Based delay without
any CCR.

I do not think we have something that identifies CCR support at
subsystem level. The spec defines CCRL at the controller level. The loop
should not that bad. nvme_find_ctrl_ccr() should return NULL if CCR is
not supported and nvme_fence_ctrl() will return immediately.

-- james

I would think CCRL on the failed controller would be enough to assume the subsystem supports it.

I'm not worried about the coding on the host is so bad. It's more the multiple paths that must have cmds sent to them and getting error responses for unknown cmds (should be responded to ok, but you never know) as well as creating conditions for other errors where there will be no return for it - e.g. other paths losing connectivity while the ccr outstanding, etc. yes, they all have to work, but why bother adding these flows to an old controller that would never do CCR ?

-- james