Re: [RFC PATCH 07/14] nvme: Add RECOVERING nvme controller state
From: Sagi Grimberg
Date: Sat Dec 27 2025 - 04:52:43 EST
On 25/12/2025 19:17, Mohamed Khalfella wrote:
On Thu 2025-12-25 15:29:52 +0200, Sagi Grimberg wrote:
Naming is hard. QUIESCING sounds better, I will renaming it to
On 26/11/2025 4:11, Mohamed Khalfella wrote:
Add NVME_CTRL_RECOVERING as a new controller state to be used whenIs RECOVERING really capturing the nature of this state? Maybe RESETTLING?
impacted controller is being recovered. A LIVE controller enters
RECOVERING state when an IO error is encountered. While recovering
inflight IOs will not be canceled if they timeout. These IOs will be
canceled after recovery finishes. Also, while recovering a controller
can not be reset or deleted. This is intentional because reset or delete
will result in canceling inflight IOs. When recovery finishes, the
impacted controller transitions from RECOVERING state to RESETTING state.
Reset codepath takes care of queues teardown and inflight requests
cancellation.
or QUIESCING?
QUIESCING.
Why is it strange?Note, there is no transition from RECOVERING to RESETTING added toThat is a strange transition...
nvme_change_ctrl_state(). The reason is that user should not be allowed
to reset or delete a controller that is being recovered.
Add NVME_CTRL_RECOVERED controller flag. This flag is set on a controller
about to schedule delayed work for time based recovery.
Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
---
drivers/nvme/host/core.c | 10 ++++++++++
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/sysfs.c | 1 +
3 files changed, 13 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index aa007a7b9606..f5b84bc327d3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -574,6 +574,15 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
break;
}
break;
+ case NVME_CTRL_RECOVERING:
+ switch (old_state) {
+ case NVME_CTRL_LIVE:
+ changed = true;
+ fallthrough;
+ default:
+ break;
+ }
+ break;
We transition to RECOVERING state only if controller is LIVE. This is
when we expect to have inflight user IOs to be quiesced by CCR. We do
not care about inflight requests in other states.
Sorry, got confused myself - I read it as the other way around...
I am missing RECOVERING -> RESETTING transition in this patch.