Re: [REQUEST DISCUSS]: speed up SCSI error handle for host with massive devices

From: Steffen Maier
Date: Tue Mar 29 2022 - 06:57:36 EST

Next message: Emil Velikov: "Re: [PATCH] dispnv50: atom: fix an incorrect NULL check on list iterator"
Previous message: Kunihiko Hayashi: "Re: [PATCH] dt-bindings: serial: uniphier: Add "resets" property as optional"
In reply to: Wenchao Hao: "[REQUEST DISCUSS]: speed up SCSI error handle for host with massive devices"
Next in thread: Wenchao Hao: "Re: [REQUEST DISCUSS]: speed up SCSI error handle for host with massive devices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 3/29/22 11:06, Wenchao Hao wrote:

SCSI timeout would call scsi_eh_scmd_add() on some conditions, host would be set
to SHOST_RECOVERY state. Once host enter SHOST_RECOVERY, IOs submitted to all
devices in this host would not succeed until the scsi_error_handler() finished.
The scsi_error_handler() might takes long time to be done, it's unbearable when
host has massive devices.

I want to ask is anyone applying another error handler flow to address this
phenomenon?

I think we can move some operations(like scsi get sense, scsi send startunit
and scsi device reset) out of scsi_unjam_host(), to perform these operations
without setting host to SHOST_RECOVERY? It would reduce the time of block the
whole host.

Waiting for your discussion.

We already have "async" aborts before even entering scsi_eh. So your use case seems to imply that those aborts fail and we enter scsi_eh?

There's eh_deadline for limiting the time spent in escalation of scsi_eh, and instead directly go to host reset. Would this help?

--
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z and LinuxONE

https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschaeftsfuehrung: David Faller
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Next message: Emil Velikov: "Re: [PATCH] dispnv50: atom: fix an incorrect NULL check on list iterator"
Previous message: Kunihiko Hayashi: "Re: [PATCH] dt-bindings: serial: uniphier: Add "resets" property as optional"
In reply to: Wenchao Hao: "[REQUEST DISCUSS]: speed up SCSI error handle for host with massive devices"
Next in thread: Wenchao Hao: "Re: [REQUEST DISCUSS]: speed up SCSI error handle for host with massive devices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]