Re: kmemleak not always catching stuff

From: Catalin Marinas
Date: Fri Sep 08 2017 - 13:45:29 EST


On Tue, Sep 05, 2017 at 10:15:45AM -0400, Steven Rostedt wrote:
> On Mon, 4 Sep 2017 11:09:05 +0100
> Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> > On Fri, Sep 01, 2017 at 06:33:11PM -0400, Steven Rostedt wrote:
> > > Now I was thinking that it may be due to the fact that the trampoline
> > > is allocated with module_alloc(), and that has some magic kasan goo in
> > > it. But when forcing the issue with adding the following code:
> > >
> > > void **pblah;
> > > void *blah;
> > >
> > > pblah = kmalloc(sizeof(*pblah), GFP_KERNEL);
> > > blah = module_alloc(PAGE_SIZE);
> > > *pblah = blah;
> > > printk("allocated blah %p\n", blah);
> > > kfree(pblah);
> > >
> > > in a path that I could control, it would catch it only after doing it
> > > several times. I was never able to have kmemleak catch the actual bug
> > > from user space no matter how many times I triggered it.
> >
> > module_alloc() uses vmalloc_exec(), so it is tracked by kmemleak but you
> > probably hit a false negative with the blah pointer lingering somewhere
> > on some stack.
>
> Hmm, could this also be what causes the miss of catching the lingering
> ftrace trampoline?

Not sure (not without some additional support in kmemleak to help track
down the source of false negatives; I'm on a long flight to LA next
week, maybe I manage to hack something up ;)).

BTW, I had a quick look at the trace_selftest_ops() function (without
pretending I understand the ftrace code) and there is one case where
this function can exit without freeing dyn_ops. Is this intentional?

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index cb917cebae29..b17ec642793b 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -273,7 +273,7 @@ static int trace_selftest_ops(struct trace_array *tr, int cnt)
goto out_free;
if (cnt > 1) {
if (trace_selftest_test_global_cnt == 0)
- goto out;
+ goto out_free;
}
if (trace_selftest_test_dyn_cnt == 0)
goto out_free;

> > > [do the test]
> > >
> > > # echo scan > /sys/kernel/debug/kmemleak
> >
> > Some heuristics in kmemleak cause the first leak of an object not to be
> > reported (too many false positives). You'd need to do "echo scan" at
> > least twice after an allocation.
>
> OK, is this documented somewhere?

Apparently not. I'll add some notes to the kmemleak documentation.

> > > Most of the times it found nothing. Even when I switched the above from
> > > module_alloc() to kmalloc().
> > >
> > > Is this normal?
> >
> > In general, a leak would eventually appear after a few scans or in time
> > when some memory location is overridden.
> >
> > Yet another heuristics in kmemleak is to treat pointers at some offset
> > inside an object as valid references (because of the container_of
> > tricks). However, the downside is that the bigger the object, the
> > greater chances of finding some random data that looks like a pointer.
> > We could change this logic to require precise pointers above a certain
> > size (e.g. PAGE_SIZE) where the use of container_of() is less likely.
> >
> > Kmemleak doesn't have a way to inspect false negatives but if you are
> > interested in digging further, I could add a "find=0x..." command to
> > print all references to an object during scanning. I also need to find
> > some time to implement a "stopscan" command which uses stop_machine()
> > and skips the heuristics for reducing false positives.
>
> Without the patch in the link above, there's a memory leak with the
> ftrace trampoline with the following commands:
>
> Requires: CONFIG_FUNCTION_TRACER and CONFIG_SCHED_TRACER
>
> # cd /sys/kernel/debug/tracing
> # mkdir instances/foo
> # echo wakeup > instances/foo/current_tracer
> # echo 0 > /proc/sys/kernel/ftrace_enabled
> # echo nop > instances/foo/current_tracer
> # rmdir instances/foo
>
> What the above does, is creates a new (allocated/non-static) buffer in
> the instance directory. Then we enable the wakeup tracer which
> enables function tracing and also creates a dynamic ftrace trampoline
> for it. We disable function tracing for all tracers with the proc
> sysctl ftrace_enabled set to zero. The nop removes the wakeup tracer
> and unregisters its function tracing handler. This is where the leak
> happens. The unregister path sees that function tracing is disabled and
> exits out early, without releasing the trampoline.

Are the ftrace_ops allocated dynamically in this case (and freed when
unregistered)? Otherwise, you may have an ops->trampoline still around
that kmemleak finds.

--
Catalin