Re: kmemleak splat on copy_process()

From: Andy Lutomirski
Date: Tue May 16 2017 - 19:55:58 EST


On Tue, May 16, 2017 at 6:39 AM, Catalin Marinas
<catalin.marinas@xxxxxxx> wrote:
> Thanks for cc'ing me. The vmalloc allocations have always been tricky
> for kmemleak since there are 2-3 other memory locations with the same
> value as the vmalloc'ed object: vm_struct.addr and vmap_area.va_start;
> occasionally we have vmap_area.va_end pointing to the next
> vmap_area.va_start.
>
> To have a more reliable object reference counting, kmemleak avoids
> scanning most of the vmap_area structure (apart from vmap_area.vm) and
> requires a minimum count of 2 references for a vmalloc'ed object to not
> be considered a leak (that is vm_struct.addr and the caller's location,
> in this case task->stack).
>
>> On Mon 15-05-17 23:53:18, Luis R. Rodriguez wrote:
>> > root@piggy:~# cat /sys/kernel/debug/kmemleak
>> > unreferenced object 0xffffa07500d4c000 (size 16384):
>> > comm "bash", pid 1349, jiffies 4294895999 (age 263.204s)
>> > hex dump (first 32 bytes):
>> > 9d 6e ac 57 00 00 00 00 00 00 00 00 00 00 00 00 .n.W............
>> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> > backtrace:
>> > [<ffffffffa5464cca>] kmemleak_alloc+0x4a/0xa0
>> > [<ffffffffa4fdfe6c>] __vmalloc_node_range+0x20c/0x2b0
>> > [<ffffffffa4e7d1a2>] copy_process.part.37+0x5c2/0x1af0
>> > [<ffffffffa4e7e89f>] _do_fork+0xcf/0x390
>> > [<ffffffffa4e7ec09>] SyS_clone+0x19/0x20
>> > [<ffffffffa4e03b0b>] do_syscall_64+0x5b/0xc0
>> > [<ffffffffa547072b>] return_from_SYSCALL_64+0x0/0x6a
>> > [<ffffffffffffffff>] 0xffffffffffffffff
>
> This stack here was allocated by bash for a child process via
> alloc_thread_stack_node() when CONFIG_VMAP_STACK is enabled. When the
> child died, free_stack_thread() did not call vfree_atomic() but rather
> stored the corresponding vm_struct pointer in cached_stacks[i]. However,
> the original task->stack pointer was lost (child task_struct freed), so
> when scanning the memory kmemleak can only find a single reference to
> the original stack (via vm_struct cached_stacks[i]) rather than 2 as
> required for vmalloc'ed objects.
>
> BTW, you can get more info about a specific object, including the number
> of references found, with:
>
> echo dump=0xffffa07500d4c000 > /sys/kernel/debug/kmemleak
>
> So basically kernel/fork.c has its own thread stack allocator on top of
> vmalloc and kmemleak gets confused. A quick workaround:
>
> ------------8<------------------------------
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 06d759ab4c62..785907fccf67 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -214,6 +214,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
> this_cpu_write(cached_stacks[i], NULL);
>
> tsk->stack_vm_area = s;
> + kmemleak_alloc(s->addr, THREAD_SIZE, 2, GFP_ATOMIC);
> local_irq_enable();
> return s->addr;
> }
> @@ -254,6 +255,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
> continue;
>
> this_cpu_write(cached_stacks[i], tsk->stack_vm_area);
> + kmemleak_free(tsk->stack);
> local_irq_restore(flags);
> return;
> }
> ------------8<------------------------------

I would do this differently. The problem only affects cached stacks,
right? How about kmemleal_ignoring when caching a stack and
unignoring when uncaching a stack?

Also, this needs a serious comment. How about "kmemleak does not
presently understand that a reference to a vm_area_struct implies a
reference to the vmalloced memory"?

> A more complex solution (which I'm not yet convinced it works) may be to
> pass the number of references of an objects down to the objects it
> refers. In the vmalloc case we normally have a single reference to
> vm_struct and two to the vmalloc'ed object. If a reference to the
> vmalloc'ed object disappeared we could balance it with an increased
> number of references to the vm_struct object. But this option requires
> more research.

The idea being that, with this change, kmemleak would start to
understand that surplus references to vm_area_struct cause it to think
the memory itself is reachable?

--Andy