Re: scsi logging future directions, was Re: [RFC PATCH -logging 00/10] scsi/constants: Output continuous error messages on trace

From: Hannes Reinecke
Date: Mon Aug 25 2014 - 07:31:09 EST


On 08/24/2014 10:44 PM, Christoph Hellwig wrote:
On Fri, Aug 22, 2014 at 12:39:59AM +0000, Elliott, Robert (Server Storage) wrote:
If you trigger hundreds of errors (e.g., hot remove a device
during heavy IO), then all the prints to the linux serial console
bog down the system, causing timeouts in commands to other
devices and soft lockups for applications.

Some changes that would help are:
1. Put them under SCSI logging level control
2. Use printk_ratelimited so an excessive number are trimmed

Would you like to include something like this in your
patch set?

I think we should come to an agreement where we want to go with scsi
logging first before doing various smaller adjustments. (Although your
example is one that's urgent enough that I'd like to put it in ASAP,
I had issues with it a few times).

I had a chat with Martin at Linuxcon about these issues, and we were
both in favor of getting rid of the old scsi logging mechansisms and
instead replace it by an extended version of the scsi tracepoints that
cover all places, and dump all data from the old logging mechanism
that people find useful.

In a few places we'd still want to log normal dev_printk style errors,
and the I/O completion is one of them, even if they really need to be
ratelimited and condensed.

If someone has arguments in favour of keeping the old logging code
I'd love to hear them, but in practive the traceevent code has huge
benefits:

- almost zero overhead if disabled
- can easily be used without any tools through configs, but can be used
even better with tools like trace-cmd or perf
- allows both fine and coarse grained selections of events to trace
- allows to capture statistics on each trace point without event enabling the
output
- doesn't have any of the console lockup problems.

I've already been working on updating scsi logging infrastructure, removing old cludges and streamlining it.
I'm all in favour of moving things over to scsi tracing; in fact I've already moved all the current SCSI_ML_XXX statements to tracepoints in
my current patchset.

Unfortunately I haven't found time to test things out there, and there's the patchset from Yoshihiro which needs review and integration.

As of now I've treated this as rather low priority as no-one seemed to mind and the patchsets will be touching each and every driver.

I'll be updating the patchset and send it for review.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@xxxxxxx +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/