RE: Early kernel panic in dmi_decode when running 32-bit kernel on Hyper-V on Windows 11

From: Michael Kelley
Date: Wed Apr 17 2024 - 11:52:30 EST


From: Jean DELVARE <jdelvare@xxxxxxxx> Sent: Wednesday, April 17, 2024 2:44 AM
>
> Hi Michael and Michael,
>
> Thanks to both of you for all the data and early analysis.
>
> On Tue, 2024-04-16 at 23:20 +0000, Michael Kelley wrote:
> > Thanks for the information.  I now have a repro of "dmidecode"
> > in user space complaining about a zero length entry, when running
> > in a Gen 1 VM with a 64-bit Linux guest.  Looking at
> > /sys/firmware/dmi/tables/DMI, that section of the DMI blob definitely
> > seems messed up.  The handle is 0x0005, which is the next handle in
> > sequence, but the length and type of the entry are zero.  This is a bit
> > different from the type 10 entry that you saw the 32-bit kernel
> > choking on, and I don't have an explanation for that.  After this
> > bogus entry, there are a few bytes I don't recognize, then about
> > 100 bytes of zeros, which also seems weird.
>
> Don't let the type 10 distract you. It is entirely possible that the
> byte corresponding to type == 10 is already part of the corrupted
> memory area. Can you check if the DMI table generated by Hyper-V is
> supposed to contain type 10 records at all?
>
> This smells like the DMI table has been overwritten by "something".
> Either it happened even before boot, that is, the DMI table generated
> by the VM itself is corrupted in the first place, or the DMI table was
> originally good but other kernel code wrote some data at the same
> memory location (I've seen this once in the past, although that was on
> bare metal). That would possibly still be the result of bad information
> provided by the VM (for example 2 "hardware" features being told to use
> overlapping memory ranges).
>
> You should also check the memory map (as displayed early at boot, so
> near the top of dmesg) and verify that the DMI table is located in a
> "reserved" memory area, so that area can't be used for memory
> allocation. Example on my laptop :
>
> # dmidecode 3.4
> Getting SMBIOS data from sysfs.
> SMBIOS 3.1.1 present.
> Table at 0xBA135000.
>
> So the table starts at physical address 0xba135000, which is in the
> following memory map segment:
>
> reserve setup_data: [mem 0x00000000b87b0000-0x00000000bb77dfff] reserved
>
> This memory area is marked as "reserved" so all is well. In my case,
> the table is 2256 bytes in size (not always displayed by dmidecode by
> default, but you can check the size of file
> /sys/firmware/dmi/tables/DMI), so the last byte of the table is at
> 0xba135000 + 0x8d0 - 1 = 0xba1358cf, which is still within the reserved
> range.
>
> If the whole DMI table is NOT located in a "reserved" memory area then
> it can get corrupted by any memory allocation.
>
> If the whole DMI table IS located in a "reserved" memory area, it can
> still get corrupted, but only by code which itself operates on data
> located in a reserved memory area.
>
> > But at this point, it's good that I have a repro. It has been a while since
> > I've built and run a 32-bit kernel, but I think I can get that set up with
> > the ability to get output during early boot. I'll do some further
> > debugging with dmidecode and with the 32-bit kernel to figure out
> > what's going on.  There are several mysteries here:  1) Is Hyper-V
> > really building a bad DMI blob, or is something else trashing it?
>
> This is a good question, my guess is that the table gets corrupted
> afterwards, but better not assume and actually check what the table
> looks like at generation time, from the host's perspective.
>
> > 2) Why does a 64-bit kernel succeed on the putative bad DMI blob,
> > while a 32-bit kernel fails?
>
> Both DMI tables are corrupted, but are they corrupted in the exact same
> way?
>
> >   3) Is dmidecode seeing something different from the Linux kernel?
>
> The DMI table is remapped early at boot time and the result is then
> read from dmidecode through /sys/firmware/dmi/tables/DMI. To be honest,
> I'm not sure if this "remapping" is a one-time copy or if future
> corruption would be reflected to the file. In any case, dmidecode can't
> possibly see a less corrupted version of the table. The different
> outcome is because dmidecode is more robust to invalid input than the
> in-kernel parser.
>
> Note that you can force dmidcode to read the table directly from memory
> by using the --no-sysfs option.

Thanks for all the good input! I'll follow up on these ideas. FYI, I'm
a former Microsoft employee who spent several years doing Linux
kernel work to enable running as a Hyper-V guest. So working with
Hyper-V and Linux is very familiar to me. But I retired about 6 months
ago, so I don't have the internal Microsoft connections that I once
had if help is needed from the Hyper-V side. Other Microsoft folks on
the thread may need to jump in if such help is needed. At this point,
I'm contributing to Linux kernel work as an individual.

In any case, I'll debug things from the Linux guest side and then see if
anything is needed from the Hyper-V side.

Michael

>
>
> > Give me a few days to sort all this out.  And if Linux can be made
> > more robust in the face of a bad DMI table entry, I'll submit a
> > Linux kernel patch for that.
>
> I agree that the in-kernel DMI table parser should not choke on bad
> data. dmidecode has an explicit check on "short entries":
>
> /*
> * If a short entry is found (less than 4 bytes), not only it
> * is invalid, but we cannot reliably locate the next entry.
> * Better stop at this point, and let the user know his/her
> * table is broken.
> */
> if (h.length < 4)
> {
> if (!(opt.flags & FLAG_QUIET))
> {
> fprintf(stderr,
> "Invalid entry length (%u). DMI table "
> "is broken! Stop.\n\n",
> (unsigned int)h.length);
> opt.flags |= FLAG_QUIET;
> }
> break;
> }
>
> We need to add something similar to the kernel DMI table parser,
> presumably in dmi_scan.c:dmi_decode_table().
>
> --
> Jean Delvare
> SUSE L3 Support