Re: [PATCH v4 2/3] mm/slub, kunit: add a KUnit test for SLUB debugging functionality
From: Marco Elver
Date: Thu Apr 15 2021 - 07:01:36 EST
On Thu, 15 Apr 2021 at 12:38, Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 4/15/21 12:10 PM, Oliver Glitta wrote:
> > ut 13. 4. 2021 o 15:54 Marco Elver <elver@xxxxxxxxxx> napísal(a):
> >>
> >> On Tue, 13 Apr 2021 at 12:07, <glittao@xxxxxxxxx> wrote:
> >> > From: Oliver Glitta <glittao@xxxxxxxxx>
> >> >
> >> > SLUB has resiliency_test() function which is hidden behind #ifdef
> >> > SLUB_RESILIENCY_TEST that is not part of Kconfig, so nobody
> >> > runs it. KUnit should be a proper replacement for it.
> >> >
> >> > Try changing byte in redzone after allocation and changing
> >> > pointer to next free node, first byte, 50th byte and redzone
> >> > byte. Check if validation finds errors.
> >> >
> >> > There are several differences from the original resiliency test:
> >> > Tests create own caches with known state instead of corrupting
> >> > shared kmalloc caches.
> >> >
> >> > The corruption of freepointer uses correct offset, the original
> >> > resiliency test got broken with freepointer changes.
> >> >
> >> > Scratch changing random byte test, because it does not have
> >> > meaning in this form where we need deterministic results.
> >> >
> >> > Add new option CONFIG_SLUB_KUNIT_TEST in Kconfig.
> >> > Because the test deliberatly modifies non-allocated objects, it depends on
> >> > !KASAN which would have otherwise prevented that.
> >>
> >> Hmm, did the test fail with KASAN? Is it possible to skip the tests
> >> and still run a subset of tests with KASAN? It'd be nice if we could
> >> run some of these tests with KASAN as well.
> >>
> >> > Use kunit_resource to count errors in cache and silence bug reports.
> >> > Count error whenever slab_bug() or slab_fix() is called or when
> >> > the count of pages is wrong.
> >> >
> >> > Signed-off-by: Oliver Glitta <glittao@xxxxxxxxx>
> >>
> >> Reviewed-by: Marco Elver <elver@xxxxxxxxxx>
> >>
> >
> > Thank you.
> >
> >> Thanks, this all looks good to me. But perhaps do test what works with
> >> KASAN, to see if you need the !KASAN constraint for all cases.
> >
> > I tried to run tests with KASAN functionality disabled with function
> > kasan_disable_current() and three of the tests failed with wrong
> > errors counts.
> > So I add the !KASAN constraint for all tests, because the merge window
> > is coming, we want to know if this version is stable and without other
> > mistakes.
> > We will take a closer look at that in the follow-up patch.
>
> Agreed. In this context, KASAN is essentially a different implementation of the
> same checks that SLUB_DEBUG offers (and also does other checks) and we excercise
> these SLUB_DEBUG checks by deliberately causing the corruption that they detect
> - so instead, KASAN detects it, as it should. I assume that once somebody opts
> for a full KASAN kernel build, they don't need the SLUB_DEBUG functionality at
> that point, as KASAN is more extensive (On the other hand SLUB_DEBUG kernels can
> be (and are) shipped as production distro kernels where specific targetted
> debugging can be enabled to help find bugs in production with minimal disruption).
> So trying to make both cooperate can work only to some extent and for now we've
> chosen the safer way.
Sounds reasonable. In any case, I'm fine with this version to land and
my Reviewed-by above remains valid. :-)
Thanks,
-- Marco