Pavel,
It turns out that the error handler on our systems was not getting woken up for a different reason... I submitted a patch earlier today that fixes the issue I were seeing (I CCed you on the patch).
Before I got my hands on the failing system and was able to root cause it, I was pretty sure that your patch was going to fix our issue, because after I examined the code paths, I couldn't find any other reason that the error handler would not get woken up. I tried forcing the bug that your patch fixes to occur, by compiling in some mdelay()s in a key place or two in the scsi code, but it never failed for me that way. With my patch, several systems that previously failed in 10 minutes or less successfully ran for many days.
Thanks,
Stuart
On 11/9/2017 8:54 AM, Pavel Tikhomirov wrote:
Are there any issues with this patch (https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov submitted back in September? I am willing to help if there's anything I can do to help get it accepted.
Hi, Stuart, I asked James Bottomley about the patch status offlist and it seems that the problem is - patch lacks testing and review. I would highly appreciate review from your side and anyone who wants to participate!
And if you can confirm that the patch solves the problem on your environment with no side effects please add "Tested-by:" tag also.
Thanks, Pavel
On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote:
We have a problem on several our nodes with scsi EH. Imagine such an
order of execution of two threads:
CPU1 scsi_eh_scmd_addÂÂÂÂÂÂÂ CPU2 scsi_host_queue_ready
/* shost->host_busy == 1 initialy */
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (shost->shost_state == SHOST_RECOVERY)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* does not get here */
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return 0;
lock(shost->host_lock);
shost->shost_state = SHOST_RECOVERY;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ busy = shost->host_busy++;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* host->can_queue == 1 initialy, busy == 1
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * - go to starved label */
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ lock(shost->host_lock) /* wait */
shost->host_failed++;
/* shost->host_busy == 2, shost->host_failed == 1 */
call scsi_eh_wakeup(shost) {
ÂÂÂÂif (host_busy == host_failed) {
ÂÂÂÂÂÂÂ /* does not get here */
ÂÂÂÂÂÂÂ wake_up_process(shost->ehandler)
ÂÂÂÂ}
}
unlock(shost->host_lock)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* acquire lock */
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ shost->host_busy--;
Finaly we do not wakeup scsi_error_handler and all other commands
coming will hang as we are in never ending recovery state as there
is no one left to wakeup handler.
So scsi disc in these host becomes unresponsive and all bio on node
hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.)
Main idea of the fix is to try to do wake up every time we decrement
host_busy or increment host_failed(the latter is already OK).
Now the very *last* one of busy threads getting host_lock after
decrementing host_busy will see all write operations on host's
shost_state, host_busy and host_failed completed thanks to implied
memory barriers on spin_lock/unlock, so at the time of busy==failed
we will trigger wakeup in at least one thread. (Thats why putting
recovery and failed checks under lock)
Signed-off-by: Pavel Tikhomirov <ptikhomirov@xxxxxxxxxxxxx>
---
 drivers/scsi/scsi_lib.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..6c99221d60aa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev)
ÂÂÂÂÂ if (starget->can_queue > 0)
ÂÂÂÂÂÂÂÂÂ atomic_dec(&starget->target_busy);
 + spin_lock_irqsave(shost->host_lock, flags);
ÂÂÂÂÂ if (unlikely(scsi_host_in_recovery(shost) &&
-ÂÂÂÂÂÂÂÂÂÂÂÂ (shost->host_failed || shost->host_eh_scheduled))) {
-ÂÂÂÂÂÂÂ spin_lock_irqsave(shost->host_lock, flags);
+ÂÂÂÂÂÂÂÂÂÂÂÂ (shost->host_failed || shost->host_eh_scheduled)))
ÂÂÂÂÂÂÂÂÂ scsi_eh_wakeup(shost);
-ÂÂÂÂÂÂÂ spin_unlock_irqrestore(shost->host_lock, flags);
-ÂÂÂ }
+ÂÂÂ spin_unlock_irqrestore(shost->host_lock, flags);
 Â atomic_dec(&sdev->device_busy);
 }
@@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
ÂÂÂÂÂ spin_unlock_irq(shost->host_lock);
 out_dec:
ÂÂÂÂÂ atomic_dec(&shost->host_busy);
+
+ÂÂÂ spin_lock_irq(shost->host_lock);
+ÂÂÂ if (unlikely(scsi_host_in_recovery(shost) &&
+ÂÂÂÂÂÂÂÂÂÂÂÂ (shost->host_failed || shost->host_eh_scheduled)))
+ÂÂÂÂÂÂÂ scsi_eh_wakeup(shost);
+ÂÂÂ spin_unlock_irq(shost->host_lock);
+
ÂÂÂÂÂ return 0;
 }
 @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
  out_dec_host_busy:
ÂÂÂÂÂ atomic_dec(&shost->host_busy);
+
+ÂÂÂ spin_lock_irq(shost->host_lock);
+ÂÂÂ if (unlikely(scsi_host_in_recovery(shost) &&
+ÂÂÂÂÂÂÂÂÂÂÂÂ (shost->host_failed || shost->host_eh_scheduled)))
+ÂÂÂÂÂÂÂ scsi_eh_wakeup(shost);
+ÂÂÂ spin_unlock_irq(shost->host_lock);
+
 out_dec_target_busy:
ÂÂÂÂÂ if (scsi_target(sdev)->can_queue > 0)
ÂÂÂÂÂÂÂÂÂ atomic_dec(&scsi_target(sdev)->target_busy);
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus