Re: [PATCH v2] nvme-multipath: Early exit if no path is available

From: Chao Leng
Date: Mon Feb 01 2021 - 20:13:41 EST




On 2021/2/1 18:45, Hannes Reinecke wrote:
On 2/1/21 10:40 AM, Chao Leng wrote:


On 2021/2/1 16:57, Hannes Reinecke wrote:
On 2/1/21 9:47 AM, Chao Leng wrote:


On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
Urgh. Please, no. That is well impossible to debug.
Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is?
I'm still not clear where the problem is once we applied both patches.
For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED:
head->next = ns1;
ns1->next = ns2;
ns2->next = head;
old->next = ns2;

And this is where I have issues with.
Where does 'old' come from?
Clearly it was part of the list at one point; so what happened to it?
I explained this earlier.
In nvme_ns_remove, there is a hole between list_del_rcu and
nvme_mpath_clear_current_path. If head->current_path is the "old", and
the "old" is removing. The "old" is already removed from the list by
list_del_rcu, but head->current_path is not clear to NULL by
nvme_mpath_clear_current_path.
Find path is race with nvme_ns_remove, use the "old" pass to
nvme_round_robin_path to find path.

Ah. So this should be better:

@@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
 static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
                struct nvme_ns *ns)
 {
-       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
-                       siblings);
-       if (ns)
-               return ns;
+       if (ns && !test_bit(NVME_NS_REMOVING, &ns->flags)) {
+               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
+                                          struct nvme_ns, siblings);
+               if (ns)
+                       return ns;
+       }
        return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
 }

The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it should guard against the issue you mentioned.
Looks useless, it is still infinite loop. You can check the workflow for the scenario I mentioned.

Cheers,

Hannes