Re: [PATCH 2/4] kcov: add interrupt handling self test

From: Dmitry Vyukov
Date: Tue Jun 11 2024 - 03:53:11 EST


On Wed, 5 Jun 2024 at 11:33, Marco Elver <elver@xxxxxxxxxx> wrote:
>
> On Wed, 5 Jun 2024 at 11:18, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > On Wed, 5 Jun 2024 at 11:10, Marco Elver <elver@xxxxxxxxxx> wrote:
> > >
> > > > Add a boot self test that can catch sprious coverage from interrupts.
> > > > The coverage callback filters out interrupt code, but only after the
> > > > handler updates preempt count. Some code periodically leaks out
> > > > of that section and leads to spurious coverage.
> > > > Add a best-effort (but simple) test that is likely to catch such bugs.
> > > > If the test is enabled on CI systems that use KCOV, they should catch
> > > > any issues fast.
> > > >
> > > > Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> > > > Cc: x86@xxxxxxxxxx
> > > > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > > > Cc: syzkaller@xxxxxxxxxxxxxxxx
> > > >
> > > > ---
> > > >
> > > > In my local testing w/o the previous fix,
> > > > it immidiatly produced the following splat:
> > > >
> > > > kcov: running selftest
> > > > BUG: TASK stack guard page was hit at ffffc90000147ff8
> > > > Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN PTI
> > > > ...
> > > > kvm_set_cpu_l1tf_flush_l1d+0x5/0x20
> > > > sysvec_call_function+0x15/0xb0
> > > > asm_sysvec_call_function+0x1a/0x20
> > > > kcov_init+0xe4/0x130
> > > > do_one_initcall+0xbc/0x470
> > > > kernel_init_freeable+0x4fc/0x930
> > > > kernel_init+0x1c/0x2b0
> > > > ---
> > > > kernel/kcov.c | 28 ++++++++++++++++++++++++++++
> > > > lib/Kconfig.debug | 9 +++++++++
> > > > 2 files changed, 37 insertions(+)
> > > >
> > > > diff --git a/kernel/kcov.c b/kernel/kcov.c
> > > > index c3124f6d5536..04136f80042f 100644
> > > > --- a/kernel/kcov.c
> > > > +++ b/kernel/kcov.c
> > > > @@ -1057,6 +1057,30 @@ u64 kcov_common_handle(void)
> > > > }
> > > > EXPORT_SYMBOL(kcov_common_handle);
> > > >
> > > > +#ifdef CONFIG_KCOV_TEST
> > > > +static void __init selftest(void)
> > > > +{
> > > > + volatile int i;
> > > > +
> > > > + pr_err("running self test\n");
> > > > + /*
> > > > + * Test that interrupts don't produce spurious coverage.
> > > > + * The coverage callback filters out interrupt code, but only
> > > > + * after the handler updates preempt count. Some code periodically
> > > > + * leaks out of that section and leads to spurious coverage.
> > > > + * It's hard to call the actual interrupt handler directly,
> > > > + * so we just loop here for ~400 ms waiting for a timer interrupt.
> > >
> > > Where do the 400 ms come from? I only see that it loops a long time,
> > > but that the timing is entirely dependent on how fast the CPU executes
> > > the loop.
> > >
> > > > + * We set kcov_mode to enable tracing, but don't setup the area,
> > > > + * so any attempt to trace will crash.
> > > > + */
> > > > + current->kcov_mode = KCOV_MODE_TRACE_PC;
> > > > + for (i = 0; i < (1 << 28); i++)
> > > > + ;
> > >
> > > Can't you check jiffies, and e.g. check that actual ~100-500ms have elapsed?
> > >
> > > timeout = jiffies + msecs_to_jiffies(300);
> > > while (!time_after(jiffies, timeout)) {
> > > cpu_relax();
> > > }
> >
> > We can't call any functions. If anything is instrumented, the kernel crashes.
> >
> > But just reading jiffies should be fine, so we can do:
> >
> > unsigned long start = jiffies;
> > while ((jiffies - start) * MSEC_PER_SEC / HZ < 500)
> > ;
>
> I'm quite sure that those helpers are macros, but who knows if that
> will ever change.
>
> The above open-coded version looks reasonable.

Sent v2 with fixes, PTAL.
https://lore.kernel.org/all/cover.1718092070.git.dvyukov@xxxxxxxxxx/T/#t