Re: [PATCH V11 clocksource 0/6] Do not mark clocks unstable due to delays for v5.13

From: Paul E. McKenney
Date: Sat May 01 2021 - 00:30:40 EST


On Fri, Apr 30, 2021 at 02:52:58PM +0800, Luming Yu wrote:
> On Fri, Apr 30, 2021 at 1:11 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> > > On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > >
> > > > Hello!
> > > >
> > > > If there is a sufficient delay between reading the watchdog clock and the
> > > > clock under test, the clock under test will be marked unstable through no
> > > > fault of its own. This series checks for this, doing limited retries
> > > > to get a good set of clock reads. If the clock is marked unstable
> > > > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > > > This series also provides delay injection, which may be enabled via
> > > > kernel boot parameters to test the checking for delays.
> > > >
> > > > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > > > vCPU preemption.
> > > >
> > > > 1. Provide module parameters to inject delays in watchdog.
> > > >
> > > > 2. Retry clock read if long delays detected.
> > > >
> > > > 3. Check per-CPU clock synchronization when marked unstable.
> > > >
> > > > 4. Provide a module parameter to fuzz per-CPU clock checking.
> > > >
> > > > 5. Limit number of CPUs checked for clock synchronization.
> > > >
> > > > 6. Reduce clocksource-skew threshold for TSC.
> > > >
> > > > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > > > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > > > kernel test robot:
> > > >
> > > > o Automatically compute the uncertainty margin for clocksource, and
> > > > also allow them to be specified manually before that clocksource
> > > > is registered.
> > > >
> > > > o For the automatically computed uncertainty margins, bound them
> > > > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> > > >
> > > > o For the manually specified uncertainty margins, splat (but
> > > > continue) if they are less than 100 microseconds (again 2 *
> > > > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > > > production use of this clock-skew-inducing debugging technique.
> > > >
> > > > o Manually set the uncertainty margin for clocksource_jiffies
> > > > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > > > very low frequency of these clocks.
> > > >
> > > > o Manually set the uncertainty margin for clocksource_tsc_early
> > > > to 32 milliseconds.
> > > >
> > > > o Apply numerous "Link:" fields to all patches.
> > > >
> > > > o Add some acks and CCs.
> > > >
> > > > Changes since v9:
> > > >
> > > > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > > > Zhengjun; and Thomas Gleixner.
> > > >
> > > > o Improve CPU selection for clock-synchronization checking.
> > > >
> > > > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> > > >
> > > > Changes since v8, based on Thomas Gleixner feedback:
> > > >
> > > > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> > > >
> > > > o Split out a cs_watchdog_read() function.
> > > >
> > > > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> > > >
> > > > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> > > >
> > > > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> > > >
> > > > Changes since v7, based on Thomas Gleixner feedback:
> > > >
> > > > o Fix embarrassing git-format-patch operator error.
> > > >
> > > > o Merge pairwise clock-desynchronization checking into the checking
> > > > of per-CPU clock synchronization when marked unstable.
> > > >
> > > > o Do selective per-CPU checking rather than blindly checking all
> > > > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > > > to control this behavior, with the value -1 choosing the old
> > > > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> > > >
> > > > o Fix the clock-desynchronization checking to avoid a potential
> > > > use-after-free error for dynamically allocated clocksource
> > > > structures.
> > > >
> > > > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > > > clocksource skew checking.
> > > >
> > > > o Update commit logs and do code-style updates.
> > > >
> > > > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> > > >
> > > > Changes since v5:
> > > >
> > > > o Rebased to v5.12-rc5.
> > > >
> > > > Changes since v4:
> > > >
> > > > o Rebased to v5.12-rc1.
> > > >
> > > > Changes since v3:
> > > >
> > > > o Rebased to v5.11.
> > > >
> > > > o Apply Randy Dunlap feedback.
> > > >
> > > > Changes since v2:
> > > >
> > > > o Rebased to v5.11-rc6.
> > > >
> > > > o Updated Cc: list.
> > > >
> > > > Changes since v1:
> > > >
> > > > o Applied feedback from Rik van Riel.
> > > >
> > > > o Rebased to v5.11-rc3.
> > > >
> > > > o Stripped "RFC" from the subject lines.
> > > >
> > > > Thanx, Paul
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > > > arch/x86/kernel/tsc.c | 1
> > > > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > > > b/arch/x86/kernel/tsc.c | 3
> > > > b/include/linux/clocksource.h | 2
> > > > b/kernel/time/clocksource.c | 23 ++
> > > > b/kernel/time/jiffies.c | 15 -
> > > > include/linux/clocksource.h | 3
> > > > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > > > 9 files changed, 304 insertions(+), 23 deletions(-)
> > >
> > > Hi Paul,
> > > using the v11, I added a approve flag and made it work for my early
> > > inject test where tsc is good
> > > through a cross tsc sync test. Ideally with the small tweak, we could
> > > get less tsc issues to debug.
> > > And I'm not sure it would help in real trouble shooting cases. But we
> > > will see if it would help.
> >
> > Thank you for the patch!
> >
> > However, Thomas had me rework this code to put the error injection into
> > a kernel module, so this effect is now obtained in a different way.
> > So I am unable to make use of your patch.
>
> np, thanks for the heads up.
>
> we will also need to measure the tsc sync retest and prove it's robust
> enough to trump the bad decision from clocksource watchdog based on HPET
> or other slow and old clocks while leaving good decisions pass through.
>
> we will re-spin the tsc story when your code is settled and landed in
> the mainline.

My current series exports clocksource_verify_percpu(), which might help
measuring TSC synchronization.

Thanx, Paul