Re: [PATCH] x86/pm: Fix false positive kmemleak report in msr_build_context().

From: Ingo Molnar
Date: Fri Mar 22 2024 - 06:03:30 EST



* Anton Altaparmakov <anton@xxxxxxxxxx> wrote:

> Hi Dave,
>
> > On 14 Mar 2024, at 15:05, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> >
> > On 3/14/24 07:26, Anton Altaparmakov wrote:
> >> /* image of the saved processor state */
> >> struct saved_context {
> >> - /*
> >> - * On x86_32, all segment registers except gs are saved at kernel
> >> - * entry in pt_regs.
> >> - */
> >> - u16 gs;
> >> unsigned long cr0, cr2, cr3, cr4;
> >> u64 misc_enable;
> >> struct saved_msrs saved_msrs;
> >> @@ -27,6 +22,11 @@ struct saved_context {
> >> unsigned long tr;
> >> unsigned long safety;
> >> unsigned long return_address;
> >> + /*
> >> + * On x86_32, all segment registers except gs are saved at kernel
> >> + * entry in pt_regs.
> >> + */
> >> + u16 gs;
> >> bool misc_enable_saved;
> >> } __attribute__((packed));
> >
> > Isn't this just kinda poking at the symptoms? This seems to be
> > basically the exact same bug as b0b592cf08, just with a different source
> > of unaligned structure members.
>
> Yes, that is exactly the same bug. That's how we figured out the solution in fact - it is totally the same problem with another struct member...
>
> > There's nothing to keep folks from reintroducing these kinds of issues
> > and evidently no way to detect when they happen without lengthy reproducers.
>
> Correct. But short of adding asserts / documentation that pointers must be aligned or kmemleak won't work or fixing kmemleak (which I expect is not tractical as it would become a lot slower if nothing else) not sure what else can be done.
>
> Given I cannot see any alternative to fixing the kmemleak failures I think it is worth applying this fix.
>
> Unless you have better ideas how to fix this issue?
>
> What I can say is that we run a lot of tests with our CI and applying
> this fix we do not see any kmemleak issues any more whilst without it we
> see hundreds of the above - from a single, simple test run consisting of
> 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got
> 20 failures due to this which is quite a lot. With this fix applied we
> get zero kmemleak related failures.

I turned this tidbit into the following paragraph in the commit:

Testing:

We run a lot of tests with our CI, and after applying this fix we do not
see any kmemleak issues any more whilst without it we see hundreds of
the above report. From a single, simple test run consisting of 416 individual test
cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this,
which is quite a lot. With this fix applied we get zero kmemleak related failures.

Describing the impact of a fix in a changelog is always helpful.

Thanks,

Ingo