Re: [PATCH v12] mm: slub: move sysfs slab alloc/free interfaces to debugfs

From: Vlastimil Babka
Date: Wed Jun 16 2021 - 12:17:41 EST


On 6/16/21 5:50 PM, Faiyaz Mohammed wrote:
>
>
> On 6/16/2021 4:35 PM, Vlastimil Babka wrote:
>> On 6/15/21 5:58 PM, Qian Cai wrote:
>>>
>>>
>>> On 6/11/2021 3:03 PM, Faiyaz Mohammed wrote:
>>>> alloc_calls and free_calls implementation in sysfs have two issues,
>>>> one is PAGE_SIZE limitation of sysfs and other is it does not adhere
>>>> to "one value per file" rule.
>>>>
>>>> To overcome this issues, move the alloc_calls and free_calls
>>>> implementation to debugfs.
>>>>
>>>> Debugfs cache will be created if SLAB_STORE_USER flag is set.
>>>>
>>>> Rename the alloc_calls/free_calls to alloc_traces/free_traces,
>>>> to be inline with what it does.
>>>>
>>>> Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx>
>>>> Reviewed-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>>>> Signed-off-by: Faiyaz Mohammed <faiyazm@xxxxxxxxxxxxxx>
>>>
>>> Reverting this commit on today's linux-next fixed all leaks (hundreds) reported by kmemleak like below,
>>>
>>> unreferenced object 0xffff00091ae1b540 (size 64):
>>> comm "lsbug", pid 1607, jiffies 4294958291 (age 1476.340s)
>>> hex dump (first 32 bytes):
>>> 02 00 00 00 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b ........kkkkkkkk
>>> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>>> backtrace:
>>> [<ffff8000106b06b8>] slab_post_alloc_hook+0xa0/0x418
>>> [<ffff8000106b5c7c>] kmem_cache_alloc_trace+0x1e4/0x378
>>> [<ffff8000106b5e40>] slab_debugfs_start+0x30/0x50
>>> slab_debugfs_start at /usr/src/linux-next/mm/slub.c:5831
>>> [<ffff8000107b3dbc>] seq_read_iter+0x214/0xd50
>>> [<ffff8000107b4b84>] seq_read+0x28c/0x418
>>> [<ffff8000109560b4>] full_proxy_read+0xdc/0x148
>>> [<ffff800010738f24>] vfs_read+0x104/0x340
>>> [<ffff800010739ee0>] ksys_read+0xf8/0x1e0
>>> [<ffff80001073a03c>] __arm64_sys_read+0x74/0xa8
>>> [<ffff8000100358d4>] invoke_syscall.constprop.0+0xdc/0x1d8
>>> [<ffff800010035ab4>] do_el0_svc+0xe4/0x298
>>> [<ffff800011138528>] el0_svc+0x20/0x30
>>> [<ffff800011138b08>] el0t_64_sync_handler+0xb0/0xb8
>>> [<ffff80001001259c>] el0t_64_sync+0x178/0x17c
>>>
>>
>> I think the problem is here:
>>
>>>> +static void slab_debugfs_stop(struct seq_file *seq, void *v)
>>>> +{
>>>> + kfree(v);
>>>> +}
>>>> +
>>>> +static void *slab_debugfs_next(struct seq_file *seq, void *v, loff_t *ppos)
>>>> +{
>>>> + loff_t *spos = v;
>>>> + struct loc_track *t = seq->private;
>>>> +
>>>> + if (*ppos < t->count) {
>>>> + *ppos = ++*spos;
>>>> + return spos;
>>>> + }
>>>> + *ppos = ++*spos;
>>>> + return NULL;
>>>> +}
>>
>> If we return NULL, then NULL is passed to slab_debugfs_stop and thus we don't
>> kfree ppos. kfree(NULL) is silently ignored.
>>
> I think yes, if NULL passed to kfree, it simply do return.
>> I think as we have private struct loc_track, we can add a pos field there and
>> avoid the kmaloc/kfree altogether.
>>
> Hmm, yes we can add pos field "or" we can use argument "v" mean we can
> update v with pos in ->next() and use in ->show() to avoid the leak
> (kmalloc/kfree).

Can you explain the "or" part more. It's exactly what we already do, no? We use
"v" as you said. The problem is, if next(); returns NULL, then stop() gets the
NULL as "v". It's just what I see in the code of seq_read_iter() and traverse()
in fs/seq_file.c. I don't see another way to say there are no more records to
print - only to return NULL in next().
Ah, ok so we could maybe do the kfree() in next() then before returning NULL,
which is the last moment we have the pointer. But really, if we already have a
loc_track in private, why kmalloc an additional loff_t.

Anyway it seems to me also that
Documentation/filesystems/seq_file.rst should be updated, as the kfree() in
stop() is exactly what it suggests, and it doesn't show how next() indicates
that there are no more records by returning NULL, and what to do about kfree() then.