Re: [PATCH v3] Add kernel logs when sched clock unstable and NO_HZ_FULL is not possible

From: Ani Sinha
Date: Sat Jul 03 2021 - 13:30:38 EST


ping ...

On Sat, Jun 26, 2021 at 9:49 PM Ani Sinha <ani@xxxxxxxxxxx> wrote:
>
>
>
> On Tue, 22 Jun 2021, Thomas Gleixner wrote:
>
> > On Sun, Jun 20 2021 at 15:43, Ani Sinha wrote:
> >
> > > Commit 4f49b90abb4aca ("sched-clock: Migrate to use new tick
> > > dependency mask model") had also removed the kernel warning
> > > message informing the user that it was not possible to turn
> > > on NO_HZ_FULL. Adding back that log message here. It is
> > > unhelpful when the kernel turns off NO_HZ_FULL silently
> > > without informing anyone.
> > > Also added a kernel log when sched clock is marked as unstable.
> >
> > Don't do two things at once. See Documentation/process/....
>
> Ok I will split up this patch into two.
>
> >
> > Also your subject line want's a proper prefix.
>
> OK will fix.
>
> >
> > > diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> > > index c2b2859ddd82..9f9fe658f8a5 100644
> > > --- a/kernel/sched/clock.c
> > > +++ b/kernel/sched/clock.c
> > > @@ -192,8 +192,11 @@ void clear_sched_clock_stable(void)
> > >
> > > smp_mb(); /* matches sched_clock_init_late() */
> > >
> > > - if (static_key_count(&sched_clock_running.key) == 2)
> > > + if (static_key_count(&sched_clock_running.key) == 2) {
> > > + WARN_ONCE(sched_clock_stable(),
> > > + "sched clock is now marked unstable.");
> >
> > What's the WARN for here? That backtrace is largely uninteresting.
>
> OK to be consistent with other parts of the code, I will replace this with
> pr_warn()
>
> >
> > > - if (can_stop_full_tick(cpu, ts))
> > > + if (can_stop_full_tick(cpu, ts)) {
> > > tick_nohz_stop_sched_tick(ts, cpu);
> > > - else if (ts->tick_stopped)
> > > - tick_nohz_restart_sched_tick(ts, ktime_get());
> > > + } else {
> > > + /*
> > > + * Don't allow the user to think they can get
> > > + * full NO_HZ with this machine.
> > > + */
> > > + WARN_ONCE(tick_nohz_full_running,
> > > + "NO_HZ_FULL will not work for the current system.");
> >
> > can_stop_full_tick() returning false can be transient and then the user
> > still has no idea _why_ this is printed.
> >
> > Also assume the user/admin starts perf and knows he's going to disturb
> > NOHZ full, then _why_ would he be interested in that warning.
> >
>
> If NOHZ is disabled intentionally, that is not an interesting case. I am
> worried about the situation where the user specifies NOHZ option in the
> kernel commandline and the kernel silently disabled this because of one or
> more limitations in the system. I want to address this. All the reasons
> specified in the following commit is still true:
>
> commit e12d0271774fea9fddf1e2a7952a0bffb2ee8e8b
> Author: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Date: Fri May 10 17:12:28 2013 -0400
>
> nohz: Warn if the machine can not perform nohz_full
>
> If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
> kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
> due to the machine having a unstable clock, warn about it.
>
> We do not want users thinking that they are getting the benefit
> of nohz when their machine can not support it.
>
> Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Li Zhong <zhong@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
>
> This log was removed from the kernel in commit
>
> commit 4f49b90abb4aca6fe677c95fc352fd0674d489bd
> Author: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Date: Wed Jul 22 17:03:52 2015 +0200
>
> sched-clock: Migrate to use new tick dependency mask model
>
> Instead of checking sched_clock_stable from the nohz subsystem to
> verify
> its tick dependency, migrate it to the new mask in order to include it
> to the all-in-one check.
>
>
> and it seems to me that the removal of the log defeats the purpose that
> commit e12d0271774fea9fddf tried to serve.
>
> Please enlighten me as to how to achieve this and I will cook up
> something.
>
> Ani