Re: kmemleak not always catching stuff

From: Dmitry Vyukov
Date: Sat Sep 02 2017 - 06:35:38 EST


On Sat, Sep 2, 2017 at 12:33 AM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> Hi,
>
> Recently kmemleak discovered a bug in my code where an allocated
> trampoline for a ftrace function tracer wasn't freed due to an exit
> path. The thing is, kmemleak was able to catch this 100% when it was
> triggered by one of my ftrace selftests that happen at bootup. But when
> I trigger the issue from user space after bootup finished, it would not
> catch it.
>
> Now I was thinking that it may be due to the fact that the trampoline
> is allocated with module_alloc(), and that has some magic kasan goo in
> it. But when forcing the issue with adding the following code:
>
> void **pblah;
> void *blah;
>
> pblah = kmalloc(sizeof(*pblah), GFP_KERNEL);
> blah = module_alloc(PAGE_SIZE);
> *pblah = blah;
> printk("allocated blah %p\n", blah);
> kfree(pblah);
>
> in a path that I could control, it would catch it only after doing it
> several times. I was never able to have kmemleak catch the actual bug
> from user space no matter how many times I triggered it.
>
> # dmesg |grep kmemleak
> [ 16.746832] kmemleak: Kernel memory leak detector initialized
> [ 16.746888] kmemleak: Automatic memory scanning thread started
>
> And then I would do:
>
> # echo scan=on > /sys/kernel/debug/kmemleak
>
> [do the test]
>
> # echo scan > /sys/kernel/debug/kmemleak
>
> Most of the times it found nothing. Even when I switched the above from
> module_alloc() to kmalloc().
>
> Is this normal?


Hi,

We've caught some leaks triggered from userspace, so generally it
works. But I never tried to do analysis of false negatives, it's
generally hard because you don't know where are they to begin with.
For such tools it's generally useful to look at false negatives once
they come to light, because frequently that allows to fix bugs.

Having said that, kmemleak has inherent false positives due to the
fact that it does not have precise information about live data. It, of
course, looks at status of heap objects (allocated/freed), but still
it will treat as live data paddings on stack, paddings in heap
objects, uninit parts of heap objects, dead slots on stack, etc. So I
guess your pointer just stays in one of these dead slots, but kmemleak
still discovers it and does not report leak.

I assume that the task that triggered the leak has exited by the time
you do scan, right? Stack is a common place for these dead pointers.
I don't know how much of proactive zeroing kmemleak enables. E.g. does
it zero heap blocks on allocation? Does it zero task stacks on
creation? Perhaps we can do more of this.

Also, since you CCed kasan-dev, is it related to KASAN? Does it happen
only when KASAN enabled? Does it happen without KASAN? I suspect that
KASAN's quarantine can have very negative effect on kmemleak. I think
we need to do better integration there and tell kmemleak that
quarantied objects are not live. Also, does kmemleak know about actual
size of heap objects (what user asked for)? If not, then KASAN has
that info and could pass to kmemleak.