Re: [PATCH v1] Add info log when user enables NOHZ in commandline but NOHZ is not possible

From: Ani Sinha
Date: Fri Jul 09 2021 - 23:58:24 EST


On Wed, Jul 7, 2021 at 4:13 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
>
> On Wed, Jul 07, 2021 at 12:16:15PM +0200, Thomas Gleixner wrote:
> > Ani,
> >
> > On Sun, Jun 27 2021 at 12:58, Ani Sinha wrote:
> >
> > same comment vs. Subject, 'this patch' and 'we'
> >
> > > @@ -930,6 +930,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> > > {
> > > #ifdef CONFIG_NO_HZ_FULL
> > > int cpu = smp_processor_id();
> > > + static bool no_tick_warned;
> > >
> > > if (!tick_nohz_full_cpu(cpu))
> > > return;
> > > @@ -937,10 +938,24 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
> > > if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
> > > return;
> > >
> > > - if (can_stop_full_tick(cpu, ts))
> > > + if (can_stop_full_tick(cpu, ts)) {
> > > tick_nohz_stop_sched_tick(ts, cpu);
> > > - else if (ts->tick_stopped)
> > > - tick_nohz_restart_sched_tick(ts, ktime_get());
> > > + if (no_tick_warned) {
> > > + pr_info("NO_HZ_FULL is now enabled in the system.\n");
> > > + no_tick_warned = false;
> > > + }
> > > + } else {
> > > + /*
> > > + * Don't allow the user to think they can get
> > > + * full NO_HZ with this machine.
> > > + */
> > > + if (!no_tick_warned && tick_nohz_full_running) {
> > > + pr_info("NO_HZ_FULL has been disabled in the system.");
> > > + no_tick_warned = true;
> > > + }
> >
> > So this is going to emit this message everytime the NOHZ state of a CPU
> > changes, which is an easy to trigger dmesg flooding from unpriviledged
> > user space.
> >
> > Also the message is misleading because this is a per CPU condition and
> > not a system wide condition.
>
> IIUC, the clock can be marked unstable anytime before or after nohz_full
> is initialized.
>
> So we can do two things:
>
> * If tick_nohz_init() is called after the clock has been marked unstable,
> just don't initialize nohz_full.
>
> * If the clock is marked unstable after tick_nohz_init(), issue a pr_warn()
> from __clear_sched_clock_stable().

As Peter mentions in the other thread, there is already logging for
this transition in the work function. The reason why I sent these
patches was because I was seeing some issues on the hyperV platform
where the sched clock was seemingly marked unstable but no logging was
present. I have bisected that issue and it seems we should fix this
elsewhere, in the hyperV platform init code and not here by adding
more logging, Tglx was right - this is per cpu state and we do see
some toggles between NOHZ enable and disable (some other condition
unrelated to sched clock stability). So adding logs here would add
more confusion and verbosity for reasons other than the one we wish to
alert users for. It would not add any value.

So this patch can be discarded.

Ani