Re: [PATCH v5] nvme: multipath: Implemented new iopolicy "queue-depth"

From: John Meneghini
Date: Thu May 23 2024 - 11:09:01 EST


On 5/23/24 02:52, Christoph Hellwig wrote:
+ /*
+ * queue-depth iopolicy does not need to reference ->current_path
+ * but round-robin needs the last path used to advance to the
+ * next one, and numa will continue to use the last path unless
+ * it is or has become not optimized
+ */

Can we please turn this into a full sentence? I.e.:

/*
* The queue-depth iopolicy does not need to reference ->current_path,
* but the round-robin iopolicy needs the last path used to advance to
* the next one, and numa will continue to use the last path unless
* it is or has become non-optimized.
*/

?

I can do that.

+ if (iopolicy == NVME_IOPOLICY_QD)
+ return nvme_queue_depth_path(head);
+
+ node = numa_node_id();
ns = srcu_dereference(head->current_path[node], &head->srcu);
if (unlikely(!ns))
return __nvme_find_path(head, node);
- if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
+ if (iopolicy == NVME_IOPOLICY_RR)
return nvme_round_robin_path(head, node, ns);
+
if (unlikely(!nvme_path_is_optimized(ns)))
return __nvme_find_path(head, node);
return ns;

Also this is growing into the kind of spaghetti code that is on the fast
path to become unmaintainable. I'd much rather see the
srcu_dereference + __nvme_find_path duplicated and have a switch over
the iopolicies with a separate helper for each of them here than the
various ifs at different levels.


OK I will turn this into a switch statement.

+static void nvme_subsys_iopolicy_update(struct nvme_subsystem *subsys, int iopolicy)

Overly long line here.

I can fix this.

+{
+ struct nvme_ctrl *ctrl;
+ int old_iopolicy = READ_ONCE(subsys->iopolicy);
+
+ if (old_iopolicy == iopolicy)
+ return;
+
+ WRITE_ONCE(subsys->iopolicy, iopolicy);
+
+ /* iopolicy changes reset the counters and clear the mpath by design */
+ mutex_lock(&nvme_subsystems_lock);
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ atomic_set(&ctrl->nr_active, 0);
+ nvme_mpath_clear_ctrl_paths(ctrl);
+ }
+ mutex_unlock(&nvme_subsystems_lock);

You probably want to take the lock over the iopolicy assignment to
serialize it. And why do we need the atomic_set here?

Since we are targeting this to 6.11, I will work on refactoring this code.

I'll remove the atomic set here, but I may also add a WARN_ON_ONCE someplace just to be sure our assumptions about the nr_active counter state is correct. I agree with Keith that these counters need to be accurate in order for queue-depth to work.

+
+ pr_notice("%s: changed from %s to %s for subsysnqn %s\n", __func__,
+ nvme_iopolicy_names[old_iopolicy], nvme_iopolicy_names[iopolicy],

Pleae avoid the overly long line here as well.

Hmm... I thought I fixed this already. Will do.

NVME_REQ_CANCELLED = (1 << 0),
NVME_REQ_USERCMD = (1 << 1),
NVME_MPATH_IO_STATS = (1 << 2),
+ NVME_MPATH_CNT_ACTIVE = (1 << 3),

This does not match the indentation above.

I must have my tab stop set incorrectly in .vimrc or something. I'll fix this.

/John