Re: [PATCH 0/3] clear_warn_once: add timed interval resetting

From: Paul Gortmaker
Date: Tue Dec 01 2020 - 13:06:10 EST


[Re: [PATCH 0/3] clear_warn_once: add timed interval resetting] On 01/12/2020 (Tue 13:49) Petr Mladek wrote:

[...]

> Is this feature requested by RT people?
> Or is it just a possible use-case?
>
> I am not sure that RT is a really good example. The cron job is only
> part of the problem. The message would create a noise on its own.
> It would be shown on console or read/stored by a userspace log
> daemon. I am not sure that RT people would really want to use this.

To be clear, no RT person requested this, and it is just one possible
use case. Enabling the sysadmin to be able to collect more data on
recurrence equally applies to WARN_ONCE as it does printk_once.

> That said, I still do not have strong opinion about the feature.
> It might make sense on its own. But I still see it as a workaround
> for another problem.

I'm not sure how it could be a workaround for anything, really. It
doesn't hide anything -- it would instead possibly cause more output.
It enables a sysadmin to collect more data on recurrence when asked to
by a developer like one of us -- without having to ask the sysadmin to
be rebuilding the kernel or altering the rootfs. "Please boot with
this boot-arg, and run for 3 days and report what you see."

If you get a WARN_ONCE, and choose to ignore it - you have already
decided you are OK with running with something clearly broken (not
good). Being able to easily check if it happens again over time seems
like a good step towards resolving the issue vs. ignoring it.

> Non-trivial periodic tasks sometimes cause problems. And we do not
> know how big avalanche of messages it might restart.

Without specifics, I can't really address what problems you speak of.
But with a 2m minimum, if we add that - we can definitely say the risk
of "big avalanche of messages" is zero and not an issue. We could even
use 5 or 10m minimum w/o really changing what I'm trying to achieve here.

> Also the once is sometimes used on purpose. It prevents repeated delays
> on fast paths. I wonder if it can sometimes even prevent recursion.

Again, I can't really address an open speculation like that, other than
to say if we do have an example of such recursion blocking, we should
code it explicitly, so it doesn't hide as a trap and blow up if someone
removes the "_once" at a later date as a part of a mainline change.

> I know that everything is possible already now. But this patchset
> makes it more visible and easier to use.

So, I have one last idea that may address your concern of people abusing
the reset variable like it is something to be used everyday, blindly.

What if we unconditionally set TAINT_USER once it is used? That also
assists with the fact that such abuse is possible now even without
any of these changes applied, as you have acknowledged.

We'd be making it 100% clear that a person shouldn't be hammering away
on the reset simply because it happens to be there. The taint would
make it clear it isn't a "feature" but instead a debugging/information
gathering aid to only be used on occasion with a specific goal in mind.

I could do a v2 with a TAINT_USER addition, and a conversion to minutes,
with a 5m minimum. But I won't spam people with that unless it resolves
the concerns that you (and anyone else) might have with misuse.

If people don't see the value in it easing data collection once an issue
is spotted, I'm fine with that and will shelf the patch set, and thank
people for their valuable time and feedback.

Paul.
--

>
> Best Regards,
> Petr