Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute

From: Gerald Schaefer
Date: Mon Nov 22 2021 - 15:14:25 EST

On Mon, 22 Nov 2021 16:02:28 +0100
Vlastimil Babka <vbabka@xxxxxxx> wrote:

> On 11/19/21 20:59, Gerald Schaefer wrote:
> > On Fri, 19 Nov 2021 11:41:38 +0100
> > Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >
> >> On 11/17/21 20:39, Gerald Schaefer wrote:
> >> > Reading from alloc/free_traces attribute in /sys/kernel/debug/slab/ results
> >> > in an endless sequence of "No data". This is because slab_debugfs_start()
> >> > does not check for a "past end of file" condition and return NULL.
> >>
> >> I still have no idea how that endless sequence happens.
> >> To get it, we would have to call slab_debugfs_show() repeatedly with such v
> >> that *v == 0. Which should only happen with slab_debugfs_start() with *ppos
> >> == 0. Which your patch won't change because you add a '*ppos > t->count'
> >> condition, so *ppos has to be at least 1 to trigger this.
> >
> > Yes, very strange. After a closer look to fs/seq_file.c, especially
> > seq_read_iter(), it seems that op->next will only be called when m->count == 0,
> > at least in the first while(1) loop. Printing "No data\n" sets m->count
> > to 8, so it will continue after Fill:, then call op->next, which returns NULL
> > and breaks the second while(1) loop, and also calls op->stop. Then it returns
> > from seq_read_iter(), only to be called again, and again, ...
> >
> > Only when op->start returns NULL it will end it for good, probably
> > because seq_read_iter() will then return 0 instead of 8.
> Ah, thanks for investigating.
> > Not sure if
> > there is a better way to fix this than by adding a second "return NULL"
> > to op->start, which feels a bit awkward and makes you wonder why the
> > "return NULL" from op->next is not enough.
> I think it's fine to require op->start to return NULL, even if it didn't
> cause this infinite loop.

Yes, this all seems to work as designed, and it is slowly beginning to
make some sense. The only other way to stop it would be to return from
op->show, w/o printing anything, like it would be done for t->count > 0,
when idx becomes >= t->count. In theory at least, see below.

> >>
> >> But yeah, AFAIK we should detect this in slab_debugfs_start() anyway.
> >> But I think the condition should be something like below, because we are
> >> past end of file already with *ppos == t->count. But if both are 0, we want
> >> to proceed for the "No data" output.
> >
> > Ah ok, I wasn't sure about the "t->count > 0" case, i.e. if the check for
> > "*ppos > t->count" would still be correct there. So apparently it wouldn't,
> > and we need two checks, like you suggested
> >
> >>
> >> // to show the No data
> >> if (!*ppos && !t->count)
> >> return ppos;
> >>
> >> if (*ppos >= t->count)
> >> return ppos;
> >
> > That should be return NULL here, right?
> Doh, right.
> >>
> >> return ppos;
> >>
> >
> > Will send a new patch, unless I find a better way after investigating the
> > endless seq_read_iter() calls mentioned above.
> > Is there an easy way to test the "t->count > 0" case, i.e. what would need
> > to be done to get some other reply than "No data"?
> Hm the debugfs files alloc_tracess/free_traces for any cache with non-zero
> objects (see /proc/slabinfo for that) should have t->count > 0. If the files
> are created for a cache, it means the related SLAB_STORE_USER debugging was
> enabled both during config and boot-time. If you see only a few caches with
> alloc_tracess/free_traces (because they are from e.g. some test module that
> adds SLAB_STORE_USER explicitly) and all happen to have 0 objects, boot with
> slub_debug=U parameter and then all caches will have this enabled and many
> will have >0 objects.

Thanks. While testing this properly, yet another bug showed up. The idx
in op->show remains 0 in all iterations, so I always see the same line
printed t->count times (or infinitely, ATM). Not sure if this only shows
on s390 due to endianness, but the reason is this:

unsigned int idx = *(unsigned int *)v;

IIUC, void *v is always the same as loff_t *ppos, and therefore idx also
should be *ppos. De-referencing the loff_t * with an unsigned int * only
gives the upper 32 bit half of the 64 bit value, which remains 0.

This would be fixed e.g. with

unsigned int idx = (unsigned int) *(loff_t *) v;

With this fixed, my original patch actually also works for t->count > 0,
because then op->show would return w/o printing anything when idx reaches
t->count. For t->count > 0, it would even work w/o any extra checks in
op->start because of that, only "No data" would be printed infinitely.

It probably still makes sense to make this explicit in op->start, by
checking separately for !*ppos && !t->count, and returning NULL for
*ppos >= t->count, as you suggested.

I think I will also make idx an unsigned long again, like it was before
commit 64dd68497be7, and similar to t->count. Not sure if it needs to
be, and with proper casting unsigned int is also possible, but why
change it?