On 3/28/2023 6:53 PM, Armin Wolf wrote:Can you tell me how many BIOS settings think-lmi provides on your machine? Because according to the stacktrace,
Am 28.03.23 um 14:44 schrieb Mirsad Todorovac:
On 3/28/23 14:17, Greg Kroah-Hartman wrote:Hi,
On Tue, Mar 28, 2023 at 02:08:06PM +0200, Mirsad Todorovac wrote:
On 3/28/23 13:59, Mirsad Todorovac wrote:
On 3/28/23 13:28, Greg Kroah-Hartman wrote:
On Tue, Mar 28, 2023 at 01:13:33PM +0200, Mirsad Todorovac wrote:
Hi all,
Here is another kernel memory leak report, just as I thought we
have done with
them by the xhci patch by Mathias.
The memory leaks were caught on an AlmaLinux 8.7 (CentOS) fork
system, running
on a Lenovo desktop box (see lshw.txt) and the newest Linux
kernel 6.3-rc4 commit
g3a93e40326c8 with Mathias' patch for a xhci systemd-devd
triggered leak.
See:
<20230327095019.1017159-1-mathias.nyman@xxxxxxxxxxxxxxx> on LKML.
This leak is also systemd-devd triggered, except for the
memstick_check() leaks
which I was unable to bisect due to the box not booting older
kernels (work in
progress).
unreferenced object 0xffff88ad12392710 (size 96):
comm "systemd-udevd", pid 735, jiffies 4294896759 (age
2257.568s)
hex dump (first 32 bytes):
53 65 72 69 61 6c 50 6f 72 74 31 41 64 64 72 65
SerialPort1Addre
73 73 2c 33 46 38 2f 49 52 51 34 3b 5b 4f 70 74
ss,3F8/IRQ4;[Opt
backtrace:
[<ffffffffae8fb26c>] slab_post_alloc_hook+0x8c/0x3e0
[<ffffffffae902b49>] __kmem_cache_alloc_node+0x1d9/0x2a0
[<ffffffffae8773c9>] __kmalloc_node_track_caller+0x59/0x180
[<ffffffffae866a1a>] kstrdup+0x3a/0x70
[<ffffffffc0d839aa>]
tlmi_extract_output_string.isra.0+0x2a/0x60 [think_lmi]
[<ffffffffc0d83b64>] tlmi_setting.constprop.4+0x54/0x90
[think_lmi]
[<ffffffffc0d842b1>] tlmi_probe+0x591/0xba0 [think_lmi]
[<ffffffffc051dc53>] wmi_dev_probe+0x163/0x230 [wmi]
this "SerialPort1Address" string looks like a BIOS setup option, and
indeed think_lmi allows for
changing BIOS setup options over sysfs. While looking at
current_value_show() in think-lmi.c, i noticed
that "item" holds a string which is allocated with kstrdup(), so it
has to be freed using kfree().
This however does not happen if strbrk() fails, so maybe the memory
leak is caused by this?
Armin Wolf
Hi Armin,
I tried your suggestion, and though it is an obvious improvement and a
leak fix, this
was not the one we were searching for.
I tested the following patch:
diff --git a/drivers/platform/x86/think-lmi.c
b/drivers/platform/x86/think-lmi.c
index c816646eb661..1e77ecb0cba8 100644
--- a/drivers/platform/x86/think-lmi.c
+++ b/drivers/platform/x86/think-lmi.c
@@ -929,8 +929,10 @@ static ssize_t current_value_show(struct kobject
*kobj, struct kobj_attribute *a
/* validate and split from `item,value` -> `value` */
value = strpbrk(item, ",");
- if (!value || value == item || !strlen(value + 1))
+ if (!value || value == item || !strlen(value + 1)) {
+ kfree(item);
return -EINVAL;
+ }
ret = sysfs_emit(buf, "%s\n", value + 1);
kfree(item);
(I would also object to the use of strlen() here, for it is inherently
insecure
against SEGFAULT in kernel space.)
I still get:
[root@pc-mtodorov marvin]# uname -rms
Linux 6.3.0-rc4-armin-patch-00025-g3a93e40326c8-dirty x86_64
[root@pc-mtodorov marvin]# cat /sys/kernel/debug/kmemleak [edited]
unreferenced object 0xffff8eb008ef9260 (size 96):
comm "systemd-udevd", pid 771, jiffies 4294896499 (age 74.880s)
hex dump (first 32 bytes):
53 65 72 69 61 6c 50 6f 72 74 31 41 64 64 72 65 SerialPort1Addre
73 73 2c 33 46 38 2f 49 52 51 34 3b 5b 4f 70 74 ss,3F8/IRQ4;[Opt
backtrace:
[<ffffffff9eafb26c>] slab_post_alloc_hook+0x8c/0x3e0
[<ffffffff9eb02b49>] __kmem_cache_alloc_node+0x1d9/0x2a0
[<ffffffff9ea773c9>] __kmalloc_node_track_caller+0x59/0x180
[<ffffffff9ea66a1a>] kstrdup+0x3a/0x70
[<ffffffffc0eef9aa>] tlmi_extract_output_string.isra.0+0x2a/0x60
[think_lmi]
[<ffffffffc0eefb64>] tlmi_setting.constprop.4+0x54/0x90 [think_lmi]
[<ffffffffc0ef02c1>] tlmi_probe+0x591/0xba0 [think_lmi]
[<ffffffffc0629c53>] wmi_dev_probe+0x163/0x230 [wmi]
[<ffffffff9f1987eb>] really_probe+0x17b/0x3d0
[<ffffffff9f198ad4>] __driver_probe_device+0x84/0x190
[<ffffffff9f198c14>] driver_probe_device+0x24/0xc0
[<ffffffff9f198ed2>] __driver_attach+0xc2/0x190
[<ffffffff9f195ab1>] bus_for_each_dev+0x81/0xd0
[<ffffffff9f197c62>] driver_attach+0x22/0x30
[<ffffffff9f197354>] bus_add_driver+0x1b4/0x240
[<ffffffff9f19a0a2>] driver_register+0x62/0x120
unreferenced object 0xffff8eb018ddbb40 (size 64):
comm "systemd-udevd", pid 771, jiffies 4294896528 (age 74.780s)
hex dump (first 32 bytes):
55 53 42 50 6f 72 74 41 63 63 65 73 73 2c 45 6e USBPortAccess,En
61 62 6c 65 64 3b 5b 4f 70 74 69 6f 6e 61 6c 3a abled;[Optional:
backtrace:
[<ffffffff9eafb26c>] slab_post_alloc_hook+0x8c/0x3e0
[<ffffffff9eb02b49>] __kmem_cache_alloc_node+0x1d9/0x2a0
[<ffffffff9ea773c9>] __kmalloc_node_track_caller+0x59/0x180
[<ffffffff9ea66a1a>] kstrdup+0x3a/0x70
[<ffffffffc0eef9aa>] tlmi_extract_output_string.isra.0+0x2a/0x60
[think_lmi]
[<ffffffffc0eefb64>] tlmi_setting.constprop.4+0x54/0x90 [think_lmi]
[<ffffffffc0ef02c1>] tlmi_probe+0x591/0xba0 [think_lmi]
[<ffffffffc0629c53>] wmi_dev_probe+0x163/0x230 [wmi]
[<ffffffff9f1987eb>] really_probe+0x17b/0x3d0
[<ffffffff9f198ad4>] __driver_probe_device+0x84/0x190
[<ffffffff9f198c14>] driver_probe_device+0x24/0xc0
[<ffffffff9f198ed2>] __driver_attach+0xc2/0x190
[<ffffffff9f195ab1>] bus_for_each_dev+0x81/0xd0
[<ffffffff9f197c62>] driver_attach+0x22/0x30
[<ffffffff9f197354>] bus_add_driver+0x1b4/0x240
[<ffffffff9f19a0a2>] driver_register+0x62/0x120
unreferenced object 0xffff8eb006fe2b40 (size 64):
comm "systemd-udevd", pid 771, jiffies 4294896542 (age 74.724s)
hex dump (first 32 bytes):
55 53 42 42 49 4f 53 53 75 70 70 6f 72 74 2c 45 USBBIOSSupport,E
6e 61 62 6c 65 64 3b 5b 4f 70 74 69 6f 6e 61 6c nabled;[Optional
backtrace:
[<ffffffff9eafb26c>] slab_post_alloc_hook+0x8c/0x3e0
[<ffffffff9eb02b49>] __kmem_cache_alloc_node+0x1d9/0x2a0
[<ffffffff9ea773c9>] __kmalloc_node_track_caller+0x59/0x180
[<ffffffff9ea66a1a>] kstrdup+0x3a/0x70
[<ffffffffc0eef9aa>] tlmi_extract_output_string.isra.0+0x2a/0x60
[think_lmi]
[<ffffffffc0eefb64>] tlmi_setting.constprop.4+0x54/0x90 [think_lmi]
[<ffffffffc0ef02c1>] tlmi_probe+0x591/0xba0 [think_lmi]
[<ffffffffc0629c53>] wmi_dev_probe+0x163/0x230 [wmi]
[<ffffffff9f1987eb>] really_probe+0x17b/0x3d0
[<ffffffff9f198ad4>] __driver_probe_device+0x84/0x190
[<ffffffff9f198c14>] driver_probe_device+0x24/0xc0
[<ffffffff9f198ed2>] __driver_attach+0xc2/0x190
[<ffffffff9f195ab1>] bus_for_each_dev+0x81/0xd0
[<ffffffff9f197c62>] driver_attach+0x22/0x30
[<ffffffff9f197354>] bus_add_driver+0x1b4/0x240
[<ffffffff9f19a0a2>] driver_register+0x62/0x120
There are currently 84 wmi_dev_probe leaks, sized mostly 64 bytes, and
one 96 and two 192 bytes.
I also cannot figure out the mechanism by which current_value_show()
is called, when it is static?
Any idea?
Thanks.
Best regards,
Mirsad
Why aren't you looking at the wmi.c driver? That should be
where the
issue is, not the driver core, right?
thanks,
greg k-h
Hi, Mr. Greg,
Thanks for the quick reply.
I have added CC: for additional developers per
drivers/platform/x86/wmi.c,
however, this seems to me like hieroglyphs. There is nothing
obvious, but
I had not noticed it with v6.3-rc3?
Maybe, there seems to be something off:
949 static int wmi_dev_probe(struct device *dev)
950 {
951 struct wmi_block *wblock = dev_to_wblock(dev);
952 struct wmi_driver *wdriver =
drv_to_wdrv(dev->driver);
953 int ret = 0;
954 char *buf;
955
956 if (ACPI_FAILURE(wmi_method_enable(wblock, true)))
957 dev_warn(dev, "failed to enable device
-- probing anyway\n");
958
959 if (wdriver->probe) {
960 ret = wdriver->probe(dev_to_wdev(dev),
961 find_guid_context(wblock, wdriver));
962 if (ret != 0)
963 goto probe_failure;
964 }
965
966 /* driver wants a character device made */
967 if (wdriver->filter_callback) {
968 /* check that required buffer size
declared by driver or MOF */
969 if (!wblock->req_buf_size) {
970 dev_err(&wblock->dev.dev,
971 "Required buffer size
not set\n");
972 ret = -EINVAL;
973 goto probe_failure;
974 }
975
976 wblock->handler_data =
kmalloc(wblock->req_buf_size,
977 GFP_KERNEL);
978 if (!wblock->handler_data) {
979 ret = -ENOMEM;
980 goto probe_failure;
981 }
982
983 buf = kasprintf(GFP_KERNEL, "wmi/%s",
wdriver->driver.name);
984 if (!buf) {
985 ret = -ENOMEM;
986 goto probe_string_failure;
987 }
988 wblock->char_dev.minor =
MISC_DYNAMIC_MINOR;
989 wblock->char_dev.name = buf;
990 wblock->char_dev.fops = &wmi_fops;
991 wblock->char_dev.mode = 0444;
992 ret = misc_register(&wblock->char_dev);
993 if (ret) {
994 dev_warn(dev, "failed to
register char dev: %d\n", ret);
995 ret = -ENOMEM;
996 goto probe_misc_failure;
997 }
998 }
999
1000 set_bit(WMI_PROBED, &wblock->flags);
1001 return 0;
1002
1003 probe_misc_failure:
1004 kfree(buf);
1005 probe_string_failure:
1006 kfree(wblock->handler_data);
1007 probe_failure:
1008 if (ACPI_FAILURE(wmi_method_enable(wblock,
false)))
1009 dev_warn(dev, "failed to disable
device\n");
char *buf is passed to kfree(buf) uninitialised if
wdriver->filter_callback
is not set.
It seems like a logical error per se, but I don't believe this is
the cause
of the leak?
CORRECTION:
I overlooked the "return 0" in line 1001.
Yeah, and the memory looks to be freed properly in the
wmi_dev_remove()
callback, right?
It would appear so. To verify that:
Alloc:
976 wblock->handler_data = kmalloc(wblock->req_buf_size,
GFP_KERNEL);
<check>
983 buf = kasprintf(GFP_KERNEL, "wmi/%s", wdriver->driver.name);
<check>
989 wblock->char_dev.name = buf;
In lines 1022-1023:
1022 kfree(wblock->char_dev.name);
1023 kfree(wblock->handler_data);
This is why I don't think things should be rushed, but analysed
with clear and
cold head. And with as many eyes as possible :)
The driver stuff is my long-term research interest. To state the
obvious,
the printing and multimedia education and industry in general
would benefit from
the open-source drivers for many instruments that still work, but
are obsoleted
by the producer and require unsupported versions of the OS.
Thank you again for reviewing the bug report, however, ATM I do
not think I have
what it takes to hunt down the memleak. :-/
Do you have a reproducer that you can use to show the problem better?
Unfortunately, the problem doesn't seem to appear during the run of
a particular
test, but immediately on startup of the OS. This makes it awkward to
pinpoint the
exact service that triggered memory leaks. But they would appear to
have to do
with the initialisation of the USB devices, wouldn't they?
There seem to be strings:
"USBPortAccess,Enabled;[Optional:"
"USBBIOSSupport,Enabled;[Optional"
"USBEnumerationDelay,Disabled;[Op"
This seems to be happening during USB initialisation and before any
services.
But I might as well be wrong.
Or can you test kernel patches to verify the problem is fixed or
not if
we send you patches to test?
Certainly, Lord willing, I can test the patches in the same
environment that
mainfeted the bug (or memleak).
Best regards,
Mirsad