Re: [PATCH RFC 3/3] nvme: delay failover by command quiesce timeout

From: Sagi Grimberg
Date: Wed Apr 16 2025 - 18:15:30 EST



CQT comes from the controller, and if it is high, it effectively means
that the
controller cannot handle faster failover reliably. So I think we should
leave it
as is. It is the vendor problem.
Okay, that is one way to approach it. However, because of the hung
task issue, we would be allowing the vendor to panic the initiator
with a hung task. Until CCR, and without implementing other checks
(for events which might not happen), this hung task would happen on
every messy disconnect with that vendor/array.

Its kind of pick your poison situation I guess.
We can log an error for controllers that expose overly long CQT...

Not sure we'll see a hung task here tho, its not like there is a kthread blocking
on this, its a delayed work so I think the watchdog won't complain about it...