Re: [EXTERNAL] Re: [PATCH] EDAC: update edac printk wrappers to use printk_ratelimited.

From: Tyler Hicks
Date: Wed May 05 2021 - 17:49:05 EST


On 2021-05-05 23:04:44, Borislav Petkov wrote:
> On Wed, May 05, 2021 at 03:23:57PM -0500, Tyler Hicks wrote:
> > Would it be any more acceptable to add an
> > edac_mc_printk_ratelimited() macro, which uses printk_ratelimited(),
> > and then call that new macro from edac_ce_error()?
>
> You guys are way off here: the intent of EDAC drivers is to accurately
> report errors for purposes of counting them and doing analysis on
> that collected data as to whether components are going wrong - not to
> ratelimit them as some nuisance output.
>
> With breaking the EDAC reporting, you're barking up the wrong tree - if
> you don't want to see those errors, do not load the drivers. It is that
> simple.

As I understand it, the idea here wasn't to treat the log messages as a
nuisance that should be completely squelched. The counters are monitored
and provide the definitive way to detect large scale problems but the CE
log messages can be an easier-to-discover way for humans to identify
potential problems when, for example, centralized log aggregation and
indexing is in place.

The thought was that the full stream of log messages isn't necessary to
notice that there's a problem when they are being emitted at such a high
rate (500 per second). They're just filling up disk space and/or wasting
networking bandwidth at that point. Of course, the best course of action
here is to service the machine but there's still a period of time
between the CE errors popping up and the machine being serviced.

Tyler