Re: Bug related with a 6.6.24 platform/x86 commit signed by you - Enormous memory leak

From: Dan Carpenter
Date: Fri Jul 19 2024 - 11:02:07 EST


On Fri, Jul 19, 2024 at 02:32:15PM +0200, Thomas Weißschuh wrote:
> On Fri, Jul 19, 2024 at 05:34:23PM GMT, Harshit Mogalapalli wrote:
> The reporting really should figure out which specific release or commit
> is introducing the issue. And if mainline or 6.6.41 are also affected.
>
> The linked gentoo forum thread has some actual kernel logs:
>
> Jul 16 00:01:10 [kernel] alloc_vmap_area: 133 callbacks suppressed
> Jul 16 00:01:10 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
> - Last output repeated 9 times -
> Jul 16 00:01:15 [kernel] alloc_vmap_area: 240 callbacks suppressed
> Jul 16 00:01:15 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
> - Last output repeated 9 times -
> Jul 16 00:01:17 [kernel] warn_alloc: 3 callbacks suppressed
> Jul 16 00:01:17 [kernel] Web Content: vmalloc error: size 8192, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
> Jul 16 00:01:17 [kernel] CPU: 1 PID: 2761 Comm: Web Content Not tainted 6.6.38-gentoo #1
> Jul 16 00:01:17 [kernel] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B32.2305221830 05/22/2023
> Jul 16 00:01:17 [kernel] Call Trace:
> Jul 16 00:01:17 [kernel] dump_stack_lvl+0x32/0x41
> Jul 16 00:01:17 [kernel] dump_stack+0xd/0x10
> Jul 16 00:01:17 [kernel] warn_alloc+0xab/0x111
> Jul 16 00:01:17 [kernel] __vmalloc_node_range+0x73/0x345
> Jul 16 00:01:17 [kernel] __vmalloc_node+0x55/0x5d
> Jul 16 00:01:17 [kernel] ? bpf_prog_alloc_no_stats+0x1f/0xcd
> Jul 16 00:01:17 [kernel] __vmalloc+0x14/0x16
> Jul 16 00:01:17 [kernel] ? bpf_prog_alloc_no_stats+0x1f/0xcd
> Jul 16 00:01:17 [kernel] bpf_prog_alloc_no_stats+0x1f/0xcd
> Jul 16 00:01:17 [kernel] bpf_prog_alloc+0x13/0x9f
> Jul 16 00:01:17 [kernel] bpf_prog_create_from_user+0x47/0xbd
> Jul 16 00:01:17 [kernel] ? kprobe_free_init_mem+0x4c/0x4c
> Jul 16 00:01:17 [kernel] do_seccomp+0x176/0x7ac
> Jul 16 00:01:17 [kernel] ? __ia32_sys_prctl+0x47/0x5bf
> Jul 16 00:01:17 [kernel] __ia32_sys_seccomp+0x10/0x12
> Jul 16 00:01:17 [kernel] ia32_sys_call+0xd09/0x1063
> Jul 16 00:01:17 [kernel] __do_fast_syscall_32+0x7a/0x99
> Jul 16 00:01:17 [kernel] do_fast_syscall_32+0x29/0x5b
> Jul 16 00:01:17 [kernel] do_SYSENTER_32+0x15/0x17
> Jul 16 00:01:17 [kernel] entry_SYSENTER_32+0x98/0xf8
> Jul 16 00:01:17 [kernel] EIP: 0xb7fc856d
>
> The lines with "size 20480" repeat *a lot*, it could be the issue.
>

The interesting thing about that is the working kernel had tons of these
allocation failures as well.
https://bugzilla.kernel.org/show_bug.cgi?id=219061
See the attachment which called "This is a session with the last WORKING
KERNEL 6.6.23, NO ERRORS, everything fine".

I don't think the problem is Harshit's patch. It should be easy enough
to do a `git show 9a98ab01e3ac | patch -p1 -R` and test the results.
If that doesn't fix the bug, then it would be nice to do a git bisect
between v6.6.23 and v6.6.24.

regards,
dan carpenter