Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism

From: Bart Van Assche
Date: Thu Sep 24 2015 - 18:54:32 EST

On 09/24/2015 11:14 AM, Tejun Heo wrote:
On Thu, Sep 24, 2015 at 11:09:33AM -0700, Bart Van Assche wrote:
On 09/24/2015 10:49 AM, Tejun Heo wrote:
Again, that doesn't happen.

In case anyone would be interested, the backtraces for the lockup I had
observed are as follows:

If this is happening and it's not caused by a hung in-flight request,
it's either percpu_ref being buggy or the forementioned kill/reinit
race screwing it up. percpu_ref_kill() is expected to disable
tryget_live() in a finite amount of time regardless of concurrent
tryget tries.

Hello Tejun,

Sorry that I had not yet made this clear but I agreed with the analysis in your two most recent e-mails. I think I have found the cause of the loop: for one or another reason the scsi_dh_alua driver was not loaded automatically. I think that caused the SCSI core to return a retryable error code for reads and writes sent over paths in the SCSI ALUA state "standby" instead of a non-retryable error code and that that caused the dm-mpath driver to enter an infinite loop. Loading the scsi_dh_alua driver resolved the infinite loop. Anyway, thank you for the feedback.


