Re: [PATCH 0/3] clear_warn_once: add timed interval resetting

From: Paul Gortmaker
Date: Wed Dec 09 2020 - 12:23:10 EST


[Re: [PATCH 0/3] clear_warn_once: add timed interval resetting] On 09/12/2020 (Wed 17:37) Petr Mladek wrote:

> On Thu 2020-11-26 01:30:26, Paul Gortmaker wrote:
> > The existing clear_warn_once functionality is currently a manually
> > issued state reset via the file /sys/kernel/debug/clear_warn_once when
> > debugfs is mounted. The idea being that a developer would be running
> > some tests, like LTP or similar, and want to check reproducibility
> > without having to reboot.
> >
> > But you currently can't make use of clear_warn_once unless you've got
> > debugfs enabled and mounted - which may not be desired by some people
> > in some deployment situations.
> >
> > The functionality added here allows for periodic resets in addition to
> > the one-shot reset it already had. Then we allow for a boot-time setting
> > of the periodic resets so it can be used even when debugfs isn't mounted.
> >
> > By having a periodic reset, we also open the door for having the various
> > "once" functions act as long period ratelimited messages, where a sysadmin
> > can pick an hour or a day reset if they are facing an issue and are
> > wondering "did this just happen once, or am I only being informed once?"
>
> OK, I though more about it and I NACK this patchset.

Not a problem. Thanks again for your time and explaining your thoughts.

At least it is out there if anyone wants to use it and they can follow
the discussion here when considering the pros/cons of doing so.

Paul.
--

>
> My reason:
>
> 1. The primary purpose was to provide a way to reset warn_once() without
> debugfs. From this POV, the solution is rather complicated: timers
> and another kernel parameter.
>
> 2. I am not aware of any convincing argument why debugfs could not be
> mounted on the debugged system.
>
> 3. Debugfs provides many more debugging facilities. It is designed for
> this purpose. It does not look like a good strategy to provide
> alternative interfaces just to avoid it.
>
> 4. There were mentioned several other use cases for this feature,
> like RT systems. But it was not clear that it was really needed
> or that people would really use it.
>
> 5. Some code might even rely on that it is called only once, see commit
> dfbf2897d00499f94cd ("bug: set warn variable before calling
> WARN()") or the recent
> https://lore.kernel.org/r/20201029142406.3c46855a@xxxxxxxxxxxxxxxxxx
>
> It should better stay as debugging feature that should be used with
> care.
>
>
> 6. It creates system wide ratelimited printk().
>
> We have printk_ratelimited() for this. And it is quite problematic.
> It is supposed to prevent flood of printk() messages. But it does
> not work well because the limits depend on more factors, like:
> system size, conditions, console speed.
>
> Yes, the proposed feature is supposed to solve another problem
> (lack of messages). But it is a global action that would
> re-enable >1000 messages that were limited to be printed
> only once because they could be too frequent. As a result:
>
> + it might cause flood of printk() messages
>
> + it is hard to define a good system wide time limit;
> it was even unclear what should be the lower limit.
>
> + it will restart the messages at some "random" point,
> so that the relation of the reported events would
> be unclear.
>
> From the API point of view:
>
> + printk_ratelimited() is used when we want to see that a
> problem is still there. It is per-message setting.
>
> + printk_once() is used when even printk_ratelimited() would
> be too much. It is per-message setting.
>
> + The new printk_repeated_once() is a strange mix of this two
> with the global setting. It does not fit much.
>
>
> Best Regards,
> Petr
>
> PS: I did not answer your last mail because it looked like an endless
> fight over words or point of views. I decided to make a summary
> of my view instead. These are reason why I nacked it.
>
> I know that there might be different views but so far no arguments
> changed mine. And I do not know how to explain it better.