Re: [PATCH v4 10/15] nvme-tcp: Use CCR to recover controller that hits an error
From: Randy Jennings
Date: Fri May 15 2026 - 18:56:52 EST
On Mon, Mar 30, 2026 at 4:00 AM Hannes Reinecke <hare@xxxxxxx> wrote:
> On 3/28/26 01:43, Mohamed Khalfella wrote:
> > An alive nvme controller that hits an error now will move to FENCING
> > state instead of RESETTING state. ctrl->fencing_work attempts CCR to
> > terminate inflight IOs. Regardless of the success or failure of CCR
> > operation the controller is transitioned to RESETTING state to continue
> > error recovery process.
> >
> > Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
> > @@ -2644,13 +2669,15 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
> > struct nvme_tcp_cmd_pdu *pdu = nvme_tcp_req_cmd_pdu(req);
> > struct nvme_command *cmd = &pdu->cmd;
> > int qid = nvme_tcp_queue_id(req->queue);
> > + enum nvme_ctrl_state state;
> >
> > dev_warn(ctrl->device,
> > "I/O tag %d (%04x) type %d opcode %#x (%s) QID %d timeout\n",
> > rq->tag, nvme_cid(rq), pdu->hdr.type, cmd->common.opcode,
> > nvme_fabrics_opcode_str(qid, cmd), qid);
> >
> > - if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE) {
> > + state = nvme_ctrl_state(ctrl);
> > + if (state != NVME_CTRL_LIVE && state != NVME_CTRL_FENCING) {
> > /*
> > * If we are resetting, connecting or deleting we should
> > * complete immediately because we may block controller
>
> Do we need to call nvme_tcp_error_recovery() even if the controller is
> in 'FENCING'?
> Don't we just need to return BLK_EH_RESET_TIMER when in 'FENCING' ?
While I see that the call to nvme_tcp_error_recovery() works
when in the FENCING state, having to remind myself of that is
cognative load I'd rather not have. That, and there is currently
a race condition on whether a warning is printed.
So, I agree with Hannes that it would be simpler to understand
if we just return BLK_EH_RESET_TIMER if in the FENCING state.
Sincerely,
Randy Jennings