Re: [Bug Report] nvme connect deadlock in allocating tag

From: Sagi Grimberg
Date: Sun Apr 28 2024 - 05:30:42 EST

Next message: Bagas Sanjaya: "Re: [PATCH net-next v4 01/12] Documentation: networking: add OPEN Alliance 10BASE-T1x MAC-PHY serial interface"
Previous message: liwei: "[PATCH] cpufreq/cppc: changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init()"
In reply to: Sagi Grimberg: "Re: [Bug Report] nvme connect deadlock in allocating tag"
Next in thread: kwb: "Re: [Bug Report] nvme connect deadlock in allocating tag"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 28/04/2024 12:16, Wangbing Kuang wrote:

"The error_recovery work should unquiesce the admin_q, which should fail
fast all pending admin commands,
so it is unclear to me how the connect process gets stuck."
I think the reason is: the command can be unquiesce but the tag cannot be
return until command success.

The error recovery also cancels all pending requests. See nvme_cancel_admin_tagset

"What is step (2) - make nvme io timeout to recover the connection?"
I use spdk-nvmf-target for backend. It is easy to set read/write
nvmf-target io hang and unhang. So I just set the io hang for over 30
seconds, then trigger linux-nvmf-host trigger io timeout event. then io
timeout will trigger connection recover.
by the way, I use multipath=0

Interesting, does this happen with multipath=Y ?
I didn't expect people to be using multipath=0 for fabrics in the past few
years.

"Is this reproducing with upstream nvme? or is this some distro kernel
where this happens?"
it is reproduced in a kernel based from v5.15, but I think this is common
error.

It would be beneficial to verify this.

Next message: Bagas Sanjaya: "Re: [PATCH net-next v4 01/12] Documentation: networking: add OPEN Alliance 10BASE-T1x MAC-PHY serial interface"
Previous message: liwei: "[PATCH] cpufreq/cppc: changing highest_perf to nominal_perf in cppc_cpufreq_cpu_init()"
In reply to: Sagi Grimberg: "Re: [Bug Report] nvme connect deadlock in allocating tag"
Next in thread: kwb: "Re: [Bug Report] nvme connect deadlock in allocating tag"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]