RE: [rfc] suppress excessive AER output

From: Zhang, Yanmin
Date: Thu Aug 04 2011 - 01:55:37 EST


Dave,

How about adding a new module parameter aer_printk_limit, so user space could reset it any time?

Yanmin

-----Original Message-----
From: Dave Jones [mailto:davej@xxxxxxxxxx]
Sent: Thursday, August 04, 2011 6:34 AM
To: Linux Kernel
Cc: Nguyen, Tom L; Zhang, Yanmin
Subject: [rfc] suppress excessive AER output

I have a machine that has developed some kind of problem with
its onboard ethernet. It still boots, but spewed almost 1.5G of text
(2381585 instances of the warning below) before we realised what
was going on, and blacklisted the igb driver.

Is it worth logging every single error when we're flooding like this ?
It seems unlikely that we'll find useful information in amongst that much data
that wasn't already in the first 100 instances.

I picked 100 in the (untested) example patch below arbitarily, but the exact
value could be smaller, or slightly bigger..

could we do something like this maybe ?

Dave

diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 3ea5173..4ec88c6 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -153,6 +153,17 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
{
int id = ((dev->bus->number << 8) | dev->devfn);
char prefix[44];
+ static unsigned long aer_printk_limit = 0;
+
+ aer_printk_limit++;
+
+ if (aer_printk_limit > 100)
+ return;
+
+ if (aer_printk_limit == 100) {
+ printk(KERN_ERR "Reached limit of 100 AER errors. Further AER output suppressed.\n");
+ return;
+ }

snprintf(prefix, sizeof(prefix), "%s%s %s: ",
(info->severity == AER_CORRECTABLE) ? KERN_WARNING : KERN_ERR,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/