Re: Kmemleak infrastructure improvement for task_struct leaks and call_rcu()

From: Catalin Marinas
Date: Wed May 13 2020 - 05:59:53 EST


On Tue, May 12, 2020 at 02:09:30PM -0400, Qian Cai wrote:
>
>
> > On May 12, 2020, at 10:15 AM, Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> >
> > In this case it uses kref_get() to increment the refcount. We could add
> > a kmemleak_add_trace() which allocates a new array and stores the stack
> > trace, linked to the original object. Similarly for kref_put().
> >
> > If we do this for each inc/dec call, I'd leave it off as default and
> > only enable it explicitly by cmdline argument or
> > /sys/kerne/debug/kmemleak when needed. In most cases you'd hope there is
> > no leak, so no point in tracking additional metadata. But if you do hit
> > a problem, just enable the additional tracking to help with the
> > debugging.
>
> Well, we would like those testing bots to report kmemleak (I knew
> there would be many false positives) with those additional information
> of refcount leaks in case they found ones, albeit never saw one from
> those bots at all yet.

I know the syzkaller guys tried to run the fuzzer with kmemleak enabled
and there were false positives that required human intervention. IIRC
they disabled it eventually. The proposal was for a new feature to
kmemleak to run the scanning under stop_machine() so that no other CPU
messes with linked lists etc. That would make kmemleak more reliable
under heavy load. Another option was to let the system cool down before
running the scanning.

> Since some of those bots will run fuzzers, so it would be difficult to
> reproduce. Thus, the option has to be enabled by default somehow.
> Otherwise, they could easily miss it in the first place. Iâll look
> into the see if we could make it fairly low overhead.

I guess we don't need the full stack trace. About 4 function calls to
the refcount modification should be sufficient to get an idea.

--
Catalin