Re: BUG: kworker + systemd-udevd memory leaks found in 6.1.0-rc4

From: Mirsad Goran Todorovac
Date: Mon Nov 28 2022 - 22:35:31 EST


On 10. 11. 2022. 10:20, Greg KH wrote:
On Thu, Nov 10, 2022 at 05:57:57AM +0100, Mirsad Goran Todorovac wrote:
On 04. 11. 2022. 11:40, Mirsad Goran Todorovac wrote:

Dear Sirs,

When building a RPM 6.1.0-rc3 for AlmaLinux 8.6, I have enabled
CONFIG_DEBUG_KMEMLEAK=y
and the result showed an unreferenced object in kworker process:

# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffffa01dabff6100 (size 16):
  comm "kworker/u12:4", pid 400, jiffies 4294894771 (age 5284.956s)
  hex dump (first 16 bytes):
    6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
  backtrace:
    [<000000009ff951f6>] __kmem_cache_alloc_node+0x380/0x4e0
    [<00000000451f4268>] __kmalloc_node_track_caller+0x55/0x150
    [<0000000005472512>] kstrdup+0x36/0x70
    [<000000002f797ac4>] kstrdup_const+0x28/0x30
    [<00000000e3f86581>] kvasprintf_const+0x78/0xa0
    [<00000000e15920f7>] kobject_set_name_vargs+0x23/0xa0
    [<000000004158a6c0>] dev_set_name+0x53/0x70
    [<000000001a120541>] memstick_check+0xff/0x384 [memstick]
    [<00000000122bb894>] process_one_work+0x214/0x3f0
    [<00000000fcf282cc>] worker_thread+0x34/0x3d0
    [<0000000002409855>] kthread+0xed/0x120
    [<000000007b02b4a3>] ret_from_fork+0x1f/0x30
unreferenced object 0xffffa01dabff6ec0 (size 16):
  comm "kworker/u12:4", pid 400, jiffies 4294894774 (age 5284.944s)
  hex dump (first 16 bytes):
    6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
  backtrace:
    [<000000009ff951f6>] __kmem_cache_alloc_node+0x380/0x4e0
    [<00000000451f4268>] __kmalloc_node_track_caller+0x55/0x150
    [<0000000005472512>] kstrdup+0x36/0x70
    [<000000002f797ac4>] kstrdup_const+0x28/0x30
    [<00000000e3f86581>] kvasprintf_const+0x78/0xa0
    [<00000000e15920f7>] kobject_set_name_vargs+0x23/0xa0
    [<000000004158a6c0>] dev_set_name+0x53/0x70
    [<000000001a120541>] memstick_check+0xff/0x384 [memstick]
    [<00000000122bb894>] process_one_work+0x214/0x3f0
    [<00000000fcf282cc>] worker_thread+0x34/0x3d0
    [<0000000002409855>] kthread+0xed/0x120
    [<000000007b02b4a3>] ret_from_fork+0x1f/0x30
#

Please fing the build config and lshw output attached.

dmesg is useless, as it is filled with events like:

[ 6068.996120] evbug: Event. Dev: input4, Type: 1, Code: 31, Value: 0
[ 6068.996121] evbug: Event. Dev: input4, Type: 0, Code: 0, Value: 0
[ 6069.124145] evbug: Event. Dev: input4, Type: 4, Code: 4, Value: 458762
[ 6069.124149] evbug: Event. Dev: input4, Type: 1, Code: 34, Value: 1
[ 6069.124150] evbug: Event. Dev: input4, Type: 0, Code: 0, Value: 0
[ 6069.196003] evbug: Event. Dev: input4, Type: 4, Code: 4, Value: 458762
[ 6069.196007] evbug: Event. Dev: input4, Type: 1, Code: 34, Value: 0
[ 6069.196009] evbug: Event. Dev: input4, Type: 0, Code: 0, Value: 0
[ 6069.788129] evbug: Event. Dev: input4, Type: 4, Code: 4, Value: 458792
[ 6069.788133] evbug: Event. Dev: input4, Type: 1, Code: 28, Value: 1
[ 6069.788135] evbug: Event. Dev: input4, Type: 0, Code: 0, Value: 0

This bug is confirmed in 6.1-rc4, among the "thermald" and "systemd-dev"
kernel memory leaks, potentially exposing race conditions or other more
serious bug.

How is a memory leak a race condition?

The bug is now also confirmed and now manifested also in the Ubuntu 22.04
LTS jammy 6.1-rc4 build.

Here is the kmemleak output:

unreferenced object 0xffff9242b13b3980 (size 64):
  comm "kworker/5:3", pid 43106, jiffies 4305052439 (age 71828.792s)
  hex dump (first 32 bytes):
    80 8b a0 f0 42 92 ff ff 00 00 00 00 00 00 00 00 ....B...........
    20 86 a0 f0 42 92 ff ff 00 00 00 00 00 00 00 00 ...B...........
  backtrace:
    [<00000000c5dea4db>] __kmem_cache_alloc_node+0x380/0x4e0
    [<000000002b17af47>] kmalloc_node_trace+0x27/0xa0
    [<000000004c09eee5>] xhci_alloc_command+0x6e/0x180

This is a totally different backtrace from above, how are they related?

This looks like a potential xhci issue. Can you use 'git bisect' to
track down the offending change that caused this?

thanks,

greg k-h

Hello, Greg, Thorsten!

After multiple attempts, my box's UEFI refuses to run pre-4.17 kernels.
The bisect shows the problem appeared before 4.17, so unless I find what is
causing black screen when booting pre-4.17 kernels, it's a no-go ... :(

Thanks,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union