Re: [PATCH v2] nvme-multipath: Early exit if no path is available

From: Chao Leng
Date: Mon Feb 01 2021 - 03:49:26 EST




On 2021/2/1 15:29, Hannes Reinecke wrote:
On 2/1/21 3:16 AM, Chao Leng wrote:


On 2021/1/29 17:20, Hannes Reinecke wrote:
On 1/29/21 9:46 AM, Chao Leng wrote:


On 2021/1/29 16:33, Hannes Reinecke wrote:
On 1/29/21 8:45 AM, Chao Leng wrote:


On 2021/1/29 15:06, Hannes Reinecke wrote:
On 1/29/21 4:07 AM, Chao Leng wrote:


On 2021/1/29 9:42, Sagi Grimberg wrote:

You can't see exactly where it dies but I followed the assembly to
nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
old) which returns NULL but nvme_next_ns() is returning NULL eventually
(list_next_or_null_rcu()).
So there is other bug cause nvme_next_ns abormal.
I review the code about head->list and head->current_path, I find 2 bugs
may cause the bug:
First, I already send the patch. see:
https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@xxxxxxxxxx/
Second, in nvme_ns_remove, list_del_rcu is before
nvme_mpath_clear_current_path. This may cause "old" is deleted from the
"head", but still use "old". I'm not sure there's any other
consideration here, I will check it and try to fix it.

The reason why we first remove from head->list and only then clear
current_path is because the other way around there is no way
to guarantee that that the ns won't be assigned as current_path
again (because it is in head->list).
ok, I see.

nvme_ns_remove fences continue of deletion of the ns by synchronizing
the srcu such that for sure the current_path clearance is visible.
The list will be like this:
head->next = ns1;
ns1->next = head;
old->next = ns1;

Where does 'old' pointing to?

This may cause infinite loop in nvme_round_robin_path.
for (ns = nvme_next_ns(head, old);
     ns != old;
     ns = nvme_next_ns(head, ns))
The ns will always be ns1, and then infinite loop.

No. nvme_next_ns() will return NULL.
If there is just one path(the "old") and the "old" is deleted,
nvme_next_ns() will return NULL.
The list like this:
head->next = head;
old->next = head;
If there is two or more path and the "old" is deleted,
"for" will be infinite loop. because nvme_next_ns() will return
the path which in the list except the "old", check condition will
be true for ever.

But that will be caught by the statement above:

if (list_is_singular(&head->list))

no?
Two path just a sample example.
If there is just two path, will enter it, may cause no path but there is
actually one path. It is falsely assumed that the "old" must be not deleted.
If there is more than two path, will cause infinite loop.
So you mean we'll need something like this?

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 71696819c228..8ffccaf9c19a 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
  static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
                 struct nvme_ns *ns)
  {
-       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
-                       siblings);
-       if (ns)
-               return ns;
+       if (ns) {
+               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
+                                          struct nvme_ns, siblings);
+               if (ns)
+                       return ns;
+       }
No, in the scenario, ns should not be NULL.

Why not? 'ns == NULL' is precisely the corner-case this is trying to fix...

May be we can do like this:

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 282b7a4ea9a9..b895011a2cbd 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
         return found;
  }

-static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
-               struct nvme_ns *ns)
-{
-       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
-                       siblings);
-       if (ns)
-               return ns;
-       return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
-}
+#define nvme_next_ns_condition(head, current, condition) \
+({ \
+       struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \
+               &(current)->siblings, struct nvme_ns, siblings); \
+       __ptr ? __ptr : (condition) ? (condition) = false, \
+               list_first_or_null_rcu(&(head)->list, struct nvme_ns, \
+                       siblings) : NULL; \
+})

Urgh. Please, no. That is well impossible to debug.
Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is?
I'm still not clear where the problem is once we applied both patches.
For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED:
head->next = ns1;
ns1->next = ns2;
ns2->next = head;
old->next = ns2;

My patch work flow:
nvme_next_ns_condition(head, old, true) return ns2;
nvme_next_ns_condition(head, ns2, true) change the condition to false
and return ns1;
nvme_next_ns_condition(head, ns1, false) return ns2;
nvme_next_ns_condition(head, ns2, false) return NULL;
And then the loop end.

Your patch work flow:
nvme_next_ns(head, old) return ns2;
nvme_next_ns(head, ns2) return ns1;
nvme_next_ns(head, ns1) return ns2;
nvme_next_ns(head, ns2) return ns1;
nvme_next_ns(head, ns1) return ns2;
And then the loop is infinite.

Cheers,

Hannes