Re: Kernel OOPS while creating a NVMe Namespace

From: Sagi Grimberg
Date: Mon Jun 10 2024 - 15:05:14 EST




On 10/06/2024 21:53, Keith Busch wrote:
On Mon, Jun 10, 2024 at 01:21:00PM +0530, Venkat Rao Bagalkote wrote:
Issue is introduced by the patch: be647e2c76b27f409cdd520f66c95be888b553a3.
My mistake. The namespace remove list appears to be getting corrupted
because I'm using the wrong APIs to replace a "list_move_tail". This is
fixing the issue on my end:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7c9f91314d366..c667290de5133 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3959,9 +3959,10 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
mutex_lock(&ctrl->namespaces_lock);
list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
- if (ns->head->ns_id > nsid)
- list_splice_init_rcu(&ns->list, &rm_list,
- synchronize_rcu);
+ if (ns->head->ns_id > nsid) {
+ list_del_rcu(&ns->list);
+ list_add_tail_rcu(&ns->list, &rm_list);
+ }
}
mutex_unlock(&ctrl->namespaces_lock);
synchronize_srcu(&ctrl->srcu);
--

Can we add a reproducer for this in blktests? I'm assuming that we can easily trigger this
with adding/removing nvmet namespaces?