RE: hyper_bf soft lockup on Azure Gen2 VM when taking kdump or executing kexec

From: Michael Kelley
Date: Mon Feb 03 2025 - 16:08:15 EST


From: Thomas Tai <thomas.tai@xxxxxxxxxx> Sent: Thursday, January 30, 2025 12:44 PM
>
> > -----Original Message-----
> > From: Michael Kelley <mhklinux@xxxxxxxxxxx>
> > Sent: Thursday, January 30, 2025 3:20 PM
> > To: Thomas Tai <thomas.tai@xxxxxxxxxx>; mhkelley58@xxxxxxxxx;
> > haiyangz@xxxxxxxxxxxxx; wei.liu@xxxxxxxxxx; decui@xxxxxxxxxxxxx;
> > drawat.floss@xxxxxxxxx; javierm@xxxxxxxxxx; Helge Deller
> > <deller@xxxxxx>; daniel@xxxxxxxx; airlied@xxxxxxxxx;
> > tzimmermann@xxxxxxx
> > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux-fbdev@xxxxxxxxxxxxxxx; linux-
> > kernel@xxxxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx
> > Subject: RE: hyper_bf soft lockup on Azure Gen2 VM when taking kdump or
> > executing kexec
> >
> > From: Thomas Tai <thomas.tai@xxxxxxxxxx> Sent: Thursday, January 30,
> > 2025 10:50 AM
> > >
> > > Sorry for the typo in the subject title. It should have been 'hyperv_fb soft lockup on
> > > Azure Gen2 VM when taking kdump or executing kexec'
> > >
> > > Thomas
> > >
> > > >
> > > > Hi Michael,
> > > >
> > > > We see an issue with the mainline kernel on the Azure Gen 2 VM when
> > > > trying to induce a kernel panic with sysrq commands. The VM would hang
> > > > with soft lockup. A similar issue happens when executing kexec on the VM.
> > > > This issue is seen only with Gen2 VMs(with UEFI boot). Gen1 VMs with bios
> > > > boot are fine.
> > > >
> > > > git bisect identifies the issue is cased by the commit 20ee2ae8c5899
> > > > ("fbdev/hyperv_fb: Fix logic error for Gen2 VMs in hvfb_getmem()" ).
> > > > However, reverting the commit would cause the frame buffer not to work
> > > > on the Gen2 VM.
> > > >
> > > > Do you have any hints on what caused this issue?
> > > >
> > > > To reproduce the issue with kdump:
> > > > - Install mainline kernel on an Azure Gen 2 VM and trigger a kdump
> > > > - echo 1 > /proc/sys/kernel/sysrq
> > > > - echo c > /proc/sysrq-trigger
> > > >
> > > > To reproduce the issue with executing kexec:
> > > > - Install mainline kernel on Azure Gen 2 VM and use kexec
> > > > - sudo kexec -l /boot/vmlinuz --initrd=/boot/initramfs.img --command-
> > > > line="$( cat /proc/cmdline )"
> > > > - sudo kexec -e
> > > >
> > > > Thank you,
> > > > Thomas
> >
> > I will take a look, but it might be early next week before I can do so.
> >
>
> Thank you, Michael for your help!
>
> > It looks like your soft lockup log below is from the kdump kernel (or the newly
> > kexec'ed kernel). Can you confirm? Also, this looks like a subset of the full log.
>
> Yes, the soft lockup log below is from the kdump kernel.
>
> > Do you have the full serial console log that you could email to me? Seeing
> > everything might be helpful. Of course, I'll try to repro the problem myself
> > as well.
>
> I have attached the complete bootup and kdump kernel log.
>
> File: bootup_and_kdump.log
> Line 1 ... 984 (bootup log)
> Line 990 (kdump kernel booting up)
> Line 1351 (soft lockup)
>
> Thank you,
> Thomas
>

I have reproduced the problem in an Azure VM running Oracle Linux
9.4 with the 6.13.0 kernel. Interestingly, the problem does not occur
in a VM running on a locally installed Hyper-V with Ubuntu 20.04 and
the 6.13.0 kernel. There are several differences in the two
environments: the version of Hyper-V, the VM configuration, the Linux
distro, and the .config file used to build the 6.13.0 kernel. I'll try to
figure out what make the difference, and then the root cause.

Michael