I has received quite a bit of testing with failover and controller resets. I shared some of the testing that was done at LSFMM last week.I think with Keith's recent proposed patch for fixing io accounting on failover, the
It has received enough testing to make me confident that this code is safe. That is: it won't panic, corrupt data, or otherwise do any harm. We believe the error paths will not be affected by this change... but I agree that running the error paths could negatively impact the accuracy of the nr_active counters... which could lead to an inaccurate outcome with the queue-depth policy.
I agree the nr_counter initialize should move to nvme_mpath_init_identify(), or maybe be done there in addition to in nvme_mpath_init_ctrl(). I'm will to make that change now... if that's what people want. I don't think it would require any extensive retesting.
/John
nvme_mpath_end_request() would be called even for cancelled IO and so the nr_active
counter shall be adjusted correctly for cancelled IO requests. Having said that, IMO
you shall consider moving initialization of nr_active counter to nvme_mpath_init_identify()
as that's common function invoked from regular controller initialization code path as well
the reset code path.