Re: [PATCH v3 19/21] nvme-tcp: Extend FENCING state per TP4129 on CCR failure

Next message: pr-tracker-bot: "Re: [GIT PULL] SPDX updates for 7.0-rc1"
Previous message: pr-tracker-bot: "Re: [GIT PULL] Staging driver updates for 7.0-rc1"
In reply to: Hannes Reinecke: "Re: [PATCH v3 19/21] nvme-tcp: Extend FENCING state per TP4129 on CCR failure"
Next in thread: Hannes Reinecke: "Re: [PATCH v3 19/21] nvme-tcp: Extend FENCING state per TP4129 on CCR failure"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Mohamed Khalfella

Date: Tue Feb 17 2026 - 12:59:05 EST

On Mon 2026-02-16 13:56:10 +0100, Hannes Reinecke wrote:
> On 2/14/26 05:25, Mohamed Khalfella wrote:
> > If CCR operations fail and CQT is supported, we must defer the retry of
> > inflight requests per TP4129. Update ctrl->fencing_work to schedule
> > ctrl->fenced_work, effectively extending the FENCING state. This delay
> > ensures that inflight requests are held until it is safe for them to be
> > retired.
> >
> > Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
> > ---
> > drivers/nvme/host/tcp.c | 39 +++++++++++++++++++++++++++++++++++----
> > 1 file changed, 35 insertions(+), 4 deletions(-)
> >
> Can't you merge / integrate this into the nvme_fence_ctrl() routine?

ctrl->fencing_work and ctrl->fenced_work are in transport specific
controller, struct nvme_tcp_ctrl in this case. There is no easy way to
access these members from nvme_fence_ctrl(). One option to go around
that is to move them into struct nvme_ctrl. But we call error recovery
after a controller is fenced, and error recovery is implemented in
transport specific way. That is why the delay is implemented/repeated
for every transport.

> The previous patch already extended the timeout to cover for CQT, so
> we can just wait for the timeout if CCR failed, no?

Following on the point above. One change can be done is to reset the
controller after fencing finishes instead of using error recovery.
This way everything lives in core.c. But I have not tested that.

Do you think this is better than what has been implemented now?