RE: [RFC PATCH -logging 00/10] scsi/constants: Output continuous error messages on trace

From: Elliott, Robert (Server Storage)
Date: Thu Aug 21 2014 - 20:41:54 EST


> -----Original Message-----
> From: linux-scsi-owner@xxxxxxxxxxxxxxx [mailto:linux-scsi-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Yoshihiro YUNOMAE
> Sent: Friday, 08 August, 2014 6:50 AM
> Subject: [RFC PATCH -logging 00/10] scsi/constants: Output continuous
> error messages on trace
...
> 1) printk
> Keeps current implemntation of upstream kernel.
> The messages are divided and can be mixed, but all users can
> check the error messages without any settings.

scsi_io_completion ignore the scsi_logging_level and always calls
printk if it detects ACTION_FAIL, resulting in messages like:

[10240.338600] sd 2:0:0:0: [sdr]
[10240.339722] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[10240.341662] sd 2:0:0:0: [sdr]
[10240.342792] Sense Key : Hardware Error [current]
[10240.344575] sd 2:0:0:0: [sdr]
[10240.345653] Add. Sense: Logical unit failure
[10240.347138] sd 2:0:0:0: [sdr] CDB:
[10240.348309] Read(10): 28 00 00 00 00 80 00 00 08 00

If you trigger hundreds of errors (e.g., hot remove a device
during heavy IO), then all the prints to the linux serial console
bog down the system, causing timeouts in commands to other
devices and soft lockups for applications.

Some changes that would help are:
1. Put them under SCSI logging level control
2. Use printk_ratelimited so an excessive number are trimmed

Would you like to include something like this in your
patch set?

This is an example patch that only prints them if the MLCOMPLETE
logging level is nonzero.
Off: scsi_logging_level --set --mlcomplete=0
On: scsi_logging_level --set --mlcomplete=1

Some other loglevel (e.g., ERROR_RECOVERY) could be used.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d6b4ea8..dbb601f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1037,7 +1037,9 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
switch (action) {
case ACTION_FAIL:
/* Give up and fail the remainder of the request */
- if (!(req->cmd_flags & REQ_QUIET)) {
+ if (!(req->cmd_flags & REQ_QUIET) &&
+ SCSI_LOG_LEVEL(SCSI_LOG_MLCOMPLETE_SHIFT,
+ SCSI_LOG_MLCOMPLETE_BITS)) {
scsi_print_result(cmd);
if (driver_byte(result) & DRIVER_SENSE)
scsi_print_sense("", cmd);

Converting to printk_ratelimited is harder since the prints
are spread out over three functions (and as your patch
series notes, many individual printk calls). The rates
for the printk calls might not match, which would lead to
even more confusing output.

---
Rob Elliott HP Server Storage