Re: [clocksource] 8c30ace35d: WARNING:at_kernel/time/clocksource.c:#clocksource_watchdog

From: Paul E. McKenney
Date: Thu Apr 29 2021 - 20:59:22 EST


On Thu, Apr 29, 2021 at 05:24:59PM -0700, Paul E. McKenney wrote:
> On Thu, Apr 29, 2021 at 04:04:11PM -0700, Andi Kleen wrote:
> > > > The idea is to leave the watchdog code in kernel/time/clocksource.c,
> > > > but to move the fault injection into kernel/time/clocksourcefault.c or
> > > > some such. In this new file, new clocksource structures are created that
> > > > use some existing timebase/clocksource under the covers. These then
> > > > inject delays based on module parameters (one senstive to CPU number,
> > > > the other unconditional). They register these clocksources using the
> > > > normal interfaces, and verify that they are eventually marked unstable
> > > > when the fault-injection parameters warrant it. This is combined with
> > > > the usual checking of the console log.
> > > >
> > > > Or am I missing your point?
> > >
> > > That's what I meant.
> >
> > I still think all this stuff should be in the fault injection framework,
> > like other fault injections, to have a consistent discoverable interface.
> >
> > I actually checked now and the standard fault injection supports boot arguments,
> > so needing it at boot time shouldn't be a barrier.
>
> Per Thomas's feedback, I am in the midst of converting this to a unit
> test implemented as a kernel module, at which point the only fault
> injection will be in the unit test.
>
> At the moment, the code just registers, reads, unregisters, and verifies
> that the bogus unit-test clocksources act normally. Fault injection is
> next on the list for the fine-grained clocksource. Which, as Thomas
> noted, is quite a bit simpler, as I just need to force a delay until
> the clocksource gets marked unstable with no need for fancy counting.

And this is what I currently get on the console from a successful test:

------------------------------------------------------------------------

clocksource_wdtest: --- holdoff=20
clocksource_wdtest: --- Verify jiffies-like uncertainty margin.
clocksource: wdtest-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
clocksource_wdtest: --- Verify tsc-like uncertainty margin.
clocksource: wdtest-ktime: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
clocksource_wdtest: --- tsc-like times: 1619743817068433427 - 1619743817068432303 = 1124.
clocksource_wdtest: --- Watchdog without error injection.
clocksource_wdtest: --- Watchdog with singleton error injection.
clocksource_wdtest: --- Watchdog with doublet error injection, expect console messages.
clocksource: timekeeping watchdog on CPU4: kvm-clock retried 2 times before success
clocksource_wdtest: --- Watchdog with quadruplet error injection, expect clock skew.
clocksource: timekeeping watchdog on CPU8: kvm-clock read-back delay of 401209ns, attempt 4, marking unstable
clocksource_wdtest: --- Marking wdtest-ktime unstable due to clocksource watchdog.
clocksource_wdtest: --- Done with test.

------------------------------------------------------------------------

The code currently looks like a dog's breakfast, so I will clean it
up before sending it out. And of course add the time-readout error
injection to test the other clock-skew code path.

And yes, there are WARNs to verify that skew happens when it is supposed
to and so on.

Thanx, Paul