Re: [PATCH v2] module: check if memory leak by module.

From: Jessica Yu
Date: Fri Mar 31 2017 - 18:18:38 EST


+++ Vaneet Narang [29/03/17 09:23 +0000]:
Hi,

Hmm, how can you track _all_ vmalloc allocations done on behalf of the
module? It is quite some time since I've checked kernel/module.c but
from my vague understading your check is basically only about statically
vmalloced areas by module loader. Is that correct? If yes then is this
actually useful? Were there any bugs in the loader code recently? What
led you to prepare this patch? All this should be part of the changelog!

First of all there is no issue in kernel/module.c. This patch add functionality
to detect scenario where some kernel module does some memory allocation but gets
unloaded without doing vfree. For example
static int kernel_init(void)
{
char * ptr = vmalloc(400 * 1024);
return 0;
}

static void kernel_exit(void)
{
}

Now in this case if we do rmmod then memory allocated by kernel_init
will not be freed but this patch will detect such kind of bugs in kernel module
code.

kmemleak already detects leaks just like this, and it is not just
limited to vmalloc (but also kmalloc, kmem_cache_alloc, etc). See
mm/kmemleak-test.c, it is exactly like your example.

Also, this patch is currently limited to direct vmalloc allocations
from module core code (since you are only checking for vmalloc callers
originating from mod->core_layout, not mod->init_layout, which is
discarded at the end of do_init_module(). If we want to be complete,
we'd have to do another leak check before module init code is freed.

Also We have seen bugs in some kernel modules where they allocate some memory and
gets removed without freeing them and if new module gets loaded in place
of removed module then /proc/vmallocinfo shows wrong information. vmalloc info will
show pages getting allocated by new module. So these logs will help in detecting
such issues.

This is an unfortunate side effect of having dynamically loadable modules.
After a module is gone, sprint_symbol() (which is used to display caller
information in /proc/vmallocinfo) simply cannot trace an address back to
a module that no longer exists, it is a natural limitation, and I'm not really
sure if there's much we can do about it. When chasing leaks like this,
one possibility might be to leave the module loaded so vmallocinfo can report
accurate information, and then compare the reported information after the
module unloads.

And unfortunately, this patch also demonstrates the same problem you're describing:

(1) Load leaky_module and read /proc/vmallocinfo:
0xffffa8570005d000-0xffffa8570005f000 8192 leaky_function+0x2f/0x75 [leaky_module] pages=1 vmalloc N0=1

(2) Unload leaky_module and read /proc/vmallocinfo:
0xffffa8570005d000-0xffffa8570005f000 8192 0xffffffffc038902f pages=1 vmalloc N0=1
^^^ missing caller symbol since module is now gone
On module unload, your patch prints:
[ 289.834428] Module [leaky_module] is getting unloaded before doing vfree
[ 289.835226] Memory still allocated: addr:0xffffa8570005d000 - 0xffffa8570005f000, pages 1
[ 289.836185] Allocating function leaky_function+0x2f/0x75 [leaky_module]

Ok, so far that looks fine. But kmemleak also provides information about the same leak:

unreferenced object 0xffffa8570005d000 (size 64):
comm "insmod", pid 114, jiffies 4294673713 (age 208.968s)
hex dump (first 32 bytes):
e6 7e 00 00 00 00 00 00 0a 00 00 00 16 00 00 00 .~..............
21 52 00 00 00 00 00 00 f4 7e 00 00 00 00 00 00 !R.......~......
backtrace:
[<ffffffff838415ca>] kmemleak_alloc+0x4a/0xa0
[<ffffffff83214df4>] __vmalloc_node_range+0x1e4/0x300
[<ffffffff83214fb4>] vmalloc+0x54/0x60
[<ffffffffc038902f>] leaky_function+0x2f/0x75 [leaky_module]
[<ffffffffc038e00b>] 0xffffffffc038e00b
[<ffffffff83002193>] do_one_initcall+0x53/0x1a0
[<ffffffff831bfca1>] do_init_module+0x5f/0x1ff
[<ffffffff8313189f>] load_module+0x273f/0x2b00
[<ffffffff83131dc6>] SYSC_init_module+0x166/0x180
[<ffffffff83131efe>] SyS_init_module+0xe/0x10
[<ffffffff8384d177>] entry_SYSCALL_64_fastpath+0x1a/0xa9
[<ffffffffffffffff>] 0xffffffffffffffff

(3) Load test_module, which happens to load where leaky_module used to reside in memory:
0xffffa8570005d000-0xffffa8570005f000 8192 test_module_exit+0x2f/0x1000 [test_module] pages=1 vmalloc N0=1
^^^ incorrect caller, because test_module loaded where old caller used to be

(4) Unload test_module and your patch prints:
[ 459.140089] Module [test_module] is getting unloaded before doing vfree
[ 459.140551] Memory still allocated: addr:0xffffa8570005d000 - 0xffffa8570005f000, pages 1
[ 459.141127] Allocating function test_module_exit+0x2f/0x1000 [test_module] <- incorrect caller

So unfortunately this patch also runs into the same problem, reporting
the incorrect caller, and I'm not really convinced that this patch
adds new information that isn't already available with kmemleak and
/proc/vmallocinfo.

Jessica

> static void free_module(struct module *mod)
> {
> + check_memory_leak(mod);
> +

Of course, vfree() has not been called yet. It is the beginning of
free_module(). vfree() is one of the last things you need to do. See
module_memfree(). If I am not missing something, you get pr_err()
everytime a module is unloaded.

This patch is not to detect memory allocated by kernel. module_memfree
will allocated by kernel for kernel modules but our intent is to detect
memory allocated directly by kernel modules and not getting freed.

Regards,
Vaneet Narang