Re: [PATCH 00/14] scsi: scsi_error: Introduce new error handle mechanism
From: Hannes Reinecke
Date: Wed Aug 20 2025 - 08:42:01 EST
On 8/16/25 13:24, JiangJianJun wrote:
It's unbearable for systems with large scale scsi devices share HBAs to
block all devices' IOs when handle error commands, we need a new error
handle mechanism to address this issue.
I consulted about this issue a year ago, the discuss link can be found in
refenence. Hannes replied about why we have to block the SCSI host
then perform error recovery kindly. I think it's unnecessary to block
SCSI host for all drivers and can try a small level recovery(LUN based for
example) first to avoid block the SCSI host.
The new error handle mechanism introduced in this patchset has been
developed and tested with out self developed hardware since one year
ago, now we want this mechanism can be used by more drivers.
Drivers can decide if using the new error handle mechanism and how to
handle error commands when scsi_device are scanned,the new mechanism
makes SCSI error handle more flexible.
Hmm. Yes, and no.
I fully agree that SCSI EH is in need of reworking. But adding
another layer of complexity on top of the existing one ... not sure.
Additionally: TARGET RESET TMF is dead, and has been removed from SAM
since several years. It really is not worthwhile implementing.
Can't we take a simple step, and just try to have a non-blocking version
of device reset?
I think that should cover quite some issues already.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich