Re: [PATCH v3 1/1] nvme: multipath: Implemented new iopolicy "queue-depth"

From: Sagi Grimberg
Date: Wed May 22 2024 - 06:54:55 EST




On 22/05/2024 13:48, Nilay Shroff wrote:

On 5/21/24 20:14, John Meneghini wrote:
On 5/21/24 06:16, Sagi Grimberg wrote:
Exactly, nvme_mpath_init_ctrl resets the counter.
Except you're right, the counter reset needs to move to nvme_mpath_init_identify()
or some place that is called on every controller reset.
This however raises the question of how much failover/reset tests this patch has seen...
I has received quite a bit of testing with failover and controller resets.  I shared some of the testing that was done at LSFMM last week.

It has received enough testing to make me confident that this code is safe.  That is: it won't panic, corrupt data, or otherwise do any harm.  We believe the error paths will not be affected by this change... but I agree that running the error paths could negatively impact the accuracy of the nr_active counters... which could lead to an inaccurate outcome with the queue-depth policy.

I agree the nr_counter initialize should move to nvme_mpath_init_identify(), or maybe be done there in addition to in nvme_mpath_init_ctrl(). I'm will to make that change now... if that's what people want.  I don't think it would require any extensive retesting.

/John


I think with Keith's recent proposed patch for fixing io accounting on failover, the
nvme_mpath_end_request() would be called even for cancelled IO and so the nr_active
counter shall be adjusted correctly for cancelled IO requests. Having said that, IMO
you shall consider moving initialization of nr_active counter to nvme_mpath_init_identify()
as that's common function invoked from regular controller initialization code path as well
the reset code path.

Yes, and preferably with a comment explaining why its there (despite having nothing to do with identify...)