Re: Bug related with a 6.6.24 platform/x86 commit signed by you - Enormous memory leak

From: Harshit Mogalapalli
Date: Fri Jul 19 2024 - 08:04:48 EST


Hi Max,


On 19/07/24 15:29, Max Dubois wrote:
Hello,

I write to you becouse you signed off this buggy commitment long ago.

I don't know how to report it. This is a nasty bug and I think it is related to this committed on 6.6.24 and it is still present from that kernel to even 6.6.10 only in 32 Linux machines with over 32 bit kernels (tested by me on virtualbox and Vmware guests, I don't have real 32 bit machines to test it):

commit 9a98ab01e3acba830cb0917296a13192fd23f305
Author: Harshit Mogalapalli <harshit.m.mogalapalli@xxxxxxxxxx>
Date:   Mon Nov 13 12:07:39 2023 -0800

    platform/x86: hp-bioscfg: Fix error handling in hp_add_other_attributes()

    commit f40f939917b2b4cbf18450096c0ce1c58ed59fae upstream.

    'attr_name_kobj' is allocated using kzalloc, but on all the error paths
    it is not freed, hence we have a memory leak.

    Fix the error path before kobject_init_and_add() by adding kfree().

    kobject_put() must be always called after passing the object to
    kobject_init_and_add(). Only the error path which is immediately next
    to kobject_init_and_add() calls kobject_put() and not any other error
    path after it.

    Fix the error handling after kobject_init_and_add() by moving the
    kobject_put() into the goto label err_other_attr_init that is already
    used by all the error paths after kobject_init_and_add().

    Fixes: a34fc329b189 ("platform/x86: hp-bioscfg: bioscfg")
    Cc: stable@xxxxxxxxxxxxxxx # 6.6.x: c5dbf0416000: platform/x86: hp-bioscfg: Simplify return check in hp_add_other_attributes()
    Cc: stable@xxxxxxxxxxxxxxx # 6.6.x: 5736aa9537c9: platform/x86: hp-bioscfg: move mutex_lock() down in hp_add_other_attributes()
    Reported-by: kernel test robot <lkp@xxxxxxxxx>
    Reported-by: Dan Carpenter <error27@xxxxxxxxx>
    Closes: https://lore.kernel.org/r/202309201412.on0VXJGo-lkp@xxxxxxxxx/
    Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@xxxxxxxxxx>
    [ij: Added the stable dep tags]
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/20231113200742.3593548-3-harshit.m.mogalapalli@xxxxxxxxxx
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
    Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

I reported this on Gentoo forums in this discussion:

https://forums.gentoo.org/viewtopic-p-8834077.html#8834077 <https://forums.gentoo.org/viewtopic-p-8834077.html#8834077>

In this days 32 bit machines are pretty much unused and I think this is the reason becouse no one reported it.

The bug wasn't present in kernels before 6.6.24 (example: 6.6.23 is ok).


Thanks for reporting and sendine me an email.

The commit you pointed out which is authored by me is in:

v6.6.4 - 9a98ab01e3ac platform/x86: hp-bioscfg: Fix error handling in hp_add_other_attributes()

So you should have seen this in 6.6.4 as well ?

> The bug wasn't present in kernels before 6.6.24 (example: 6.6.23 is ok).

This confused me, as the commit that you pointed out is present since 6.6.4


I tested it in various VMware and Virtualbox guests and it is very easy to reproduce it.

You just need a VM with x86 emulated processor, over 1 GB of RAM and run some applications like few terminals, a web browser and audio player.

In the log you will see a lot of complains related to vmalloc allocations not present on working kernels before 6.6.24 and this commitment.

Increasing vmalloc like suggested in the log, doesn't help.

Starting from this point the VM become unresponsive, it close apps, in doesn't open others, terminals can't execute simple commands. Sometimes you are even unable to reboot and sometimes the machines freeze, sometimes they go in total kernel exception.

This happen 100 per 100 of the time, it is easy to reproduce it everytime on any kernel 6.6.24 or more (6.7, 6.8, 6.9 and 6.10 are all affected).

Considering the kernel is supposed to support 32 bit I think this is something to fix it then I don't know how and to who point this bug too.


I couldn't quickly reason out how this error handling fix, but I think this might be due to another commit as well.

By 6.6.23/24 which branch are you referring to, upstream stable branch that Greg maintains correct ?

$ git log --oneline v6.6.23..v6.6.24 drivers/platform/
e8fc78a1c70f platform/x86/intel/tpmi: Change vsec offset to u64

I don't see my above commit in the log between 6.6.23 and 6.6.24

Could you please clarify ?



Thankx for reading and to help to resolve this really nasty bug!

Thanks,
Harshit


MD