Re: [EXTERNAL] Re: [PATCH] EDAC: update edac printk wrappers to use printk_ratelimited.

From: Tyler Hicks
Date: Wed May 05 2021 - 16:24:02 EST


On 2021-05-05 21:45:01, Borislav Petkov wrote:
> Hi Lei,
>
> On Wed, May 05, 2021 at 07:02:14PM +0000, Lei Wang (DPLAT) wrote:
> > Hi Boris,
>
> first of all, please do not top-post.
>
> > We found a corner case in production environment that there are ~500
> > CE errors per second. The SoC otherwise functions just fine. Making
> > printk ratelimited reduced CE error logging to < 20 per second.
>
> If you want to avoid CE logs flooding dmesg, there's a couple of things
> you can do:
>
> 1. Use drivers/ras/cec.c
>
> 2. Do not load EDAC drivers at all since you don't care about the error
> reports, apparently.

Lei, if you don't care about the CE error messages at all, there's
also an edac_mc_log_ce module parameter that can be used to quiet the
message emitted from edac_ce_error():

https://www.kernel.org/doc/html/latest/admin-guide/ras.html#module-parameters

> 3. Fix the CE source: replace the DIMMs, etc.
>
> > Though this is just one case so far, we think moving to
> > printk_ratelimited could benefit broader use as well, by helping
> > control the amount of kernel logging.
>
> No, this will make EDAC driver loading output incomplete when some of
> the messages are omitted due to the ratelimiting. And no, this is not
> going to happen.

Boris, I agree that a more surgical approach is needed than this if Lei
still needs some traces of the CE error messages in the logs. Would it
be any more acceptable to add an edac_mc_printk_ratelimited() macro,
which uses printk_ratelimited(), and then call that new macro from
edac_ce_error()?

If you still don't want those CE errors ratelimited by default, perhaps
a new, non-default mode (2) could be added to the edac_mc_log_ce module
parameter that uses the ratelimited variant?

Tyler

>
> HTH.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>