Re: [RFC PATCH 05/14] nvmet: Send an AEN on CCR completion

From: Sagi Grimberg

Date: Sun Jan 04 2026 - 16:10:06 EST

On 01/01/2026 0:00, Mohamed Khalfella wrote:

On Sat 2025-12-27 11:48:49 +0200, Sagi Grimberg wrote:

On 25/12/2025 20:13, Mohamed Khalfella wrote:

On Thu 2025-12-25 15:23:51 +0200, Sagi Grimberg wrote:

On 26/11/2025 4:11, Mohamed Khalfella wrote:

Send an AEN to initiator when impacted controller exists. The
notification points to CCR log page that initiator can read to check
which CCR operation completed.

Signed-off-by: Mohamed Khalfella<mkhalfella@xxxxxxxxxxxxxxx>
---
drivers/nvme/target/core.c | 27 +++++++++++++++++++++++----
drivers/nvme/target/nvmet.h | 3 ++-
include/linux/nvme.h | 3 +++
3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 7dbe9255ff42..60173833c3eb 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -202,7 +202,7 @@ static void nvmet_async_event_work(struct work_struct *work)
nvmet_async_events_process(ctrl);
}
-void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
+static void nvmet_add_async_event_locked(struct nvmet_ctrl *ctrl, u8 event_type,
u8 event_info, u8 log_page)
{
struct nvmet_async_event *aen;
@@ -215,12 +215,17 @@ void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
aen->event_info = event_info;
aen->log_page = log_page;
- mutex_lock(&ctrl->lock);
list_add_tail(&aen->entry, &ctrl->async_events);
- mutex_unlock(&ctrl->lock);
queue_work(nvmet_wq, &ctrl->async_event_work);
}
+void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
+ u8 event_info, u8 log_page)
+{
+ mutex_lock(&ctrl->lock);
+ nvmet_add_async_event_locked(ctrl, event_type, event_info, log_page);
+ mutex_unlock(&ctrl->lock);
+}
static void nvmet_add_to_changed_ns_log(struct nvmet_ctrl *ctrl, __le32 nsid)
{
@@ -1788,6 +1793,18 @@ struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args)
}
EXPORT_SYMBOL_GPL(nvmet_alloc_ctrl);
+static void nvmet_ctrl_notify_ccr(struct nvmet_ctrl *ctrl)
+{
+ lockdep_assert_held(&ctrl->lock);
+
+ if (nvmet_aen_bit_disabled(ctrl, NVME_AEN_BIT_CCR_COMPLETE))
+ return;
+
+ nvmet_add_async_event_locked(ctrl, NVME_AER_NOTICE,
+ NVME_AER_NOTICE_CCR_COMPLETED,
+ NVME_LOG_CCR);
+}
+
static void nvmet_ctrl_complete_pending_ccr(struct nvmet_ctrl *ctrl)
{
struct nvmet_subsys *subsys = ctrl->subsys;
@@ -1801,8 +1818,10 @@ static void nvmet_ctrl_complete_pending_ccr(struct nvmet_ctrl *ctrl)
list_for_each_entry(sctrl, &subsys->ctrls, subsys_entry) {
mutex_lock(&sctrl->lock);
list_for_each_entry(ccr, &sctrl->ccrs, entry) {
- if (ccr->ctrl == ctrl)
+ if (ccr->ctrl == ctrl) {
+ nvmet_ctrl_notify_ccr(sctrl);
ccr->ctrl = NULL;
+ }

Is this double loop necessary? Would you have more than one controller
cross resetting the same

As it is implemented now CCRs are linked to sctrl. This decision can be
revisited if found suboptimal. At some point I had CCRs linked to
ctrl->subsys but that led to lock ordering issues. Double loop is
necessary to find all CCRs in all controllers and mark them done.
Yes, it is possible to have more than one sctrl resetting the same
ictrl.

I'm more interested in simplifying.

controller? Won't it be better to install a callback+opaque that the
controller removal will call?

Can you elaborate more on that? Better in what terms?

nvmet_ctrl_complete_pending_ccr() is called from nvmet_ctrl_free() when
we know that ctrl->ref is zero and no new CCRs will be added to this
controller because nvmet_ctrl_find_get_ccr() will not be able to get it.

In nvmet, the controller is serving a single host. Hence I am not sure I
understand how multiple source controllers will try to reset the impacted
controller. So, if there is a 1-1 relationship between source and impacted
controller, I'd perhaps suggest to simplify and install on the impacted
controller
callback+opaque (e.g. void *data) instead of having it iterate and then
actually send
the AEN from the impacted controller.

A controller is serving a single path for a given host. A host that is
connected to nvme subsystem via multiple paths will have more than one
controller. I can think of two reasons why we need to support resetting
an impacted controller from multiple source controllers.

- It is possible for multiple paths to go down at the same time. The
first source controller we use for CCR, even though we check to see if
LIVE, might have lost connection to subsystem. It is a matter of time
for it to see keepalive timeout and fail too. If CCR fails using this
controller we should not give up. We need to try other paths.

But the host is doing the cross-reset synchronously... it waits for
kato for a completion of the reset, and then gives up, its not like it
is sitting there waiting for the AEN...

Generally the fact that the spec states a capability/flexibility, it is still Linux's
choice to choose weather to implement it. I'm trying to understand if we can
simplify Linux host and controller in this non-trivial error recovery flow.

What is your expectation to happen in general? what are your expected kato/cqt
values? how many attempts do we want the host to do?

- Some nvme subsystems might support resetting impacted controller from
a subset of controllers connected to the host. An array that has
multiple frontend engines might not support resetting controllers
across engines. In fact, TP8028 allows for subsystem to suggest to
host to use another source controller in Alternate Controller ID
(ACID) fied on CCR logpage (not implemente in this patchset).

It is not a case though where the impacted controller will be reset from multiple
source controller at the same time...

I'd also say that if indeed there are subsystems that require specific controllers to do
cross recovery, they won't be able to use this at all... Are there any such arrays?