Re: [PATCH] clocksource: disable irq when holding watchdog_lock.

From: Tetsuo Handa
Date: Fri Oct 27 2023 - 11:18:48 EST


On 2023/10/26 6:28, Thomas Gleixner wrote:
> I have no idea what the kernel, VirtualPox or Windoze are doing during
> that time. I fear you need to add some debug on your own or if
> VirtualPox has a monitor/debugger you might use that to inspect what the
> guest is doing.

Although VirtualBox has a debugger
( https://www.virtualbox.org/manual/ch12.html#ts_debugger ), I'm not familiar enough
to use it; I'd like to debug from the guest side.

I found a minimal kernel config.
Changing https://I-love.SAKURA.ne.jp/tmp/config-6.6-rc7-ok from CONFIG_HZ=250
to CONFIG_HZ=1000 likely reproduces this slowdown problem. This difference
explains that this problem is timing-dependent; something unexpected event is
happening while bringing up secondary CPUs.

Fedora kernels have CONFIG_HZ=1000 and Ubuntu kernels have CONFIG_HZ=250.
I guess that changing CONFIG_HZ value is nothing special from the point of
view of hypervisors and host OS.

Can somebody reproduce this problem using different hypervisors and host OS?
You can try whether booting e.g. Fedora-Everything-netinst-x86_64-Rawhide-20231018.n.0.iso ,
Fedora-Server-netinst-x86_64-37-1.7.iso , Fedora-Everything-netinst-x86_64-34-1.2.iso etc. with
"nosmp" option added reaches GUI installer screen much faster than booting these ISO images
without adding "nosmp" option. Alternatively, you can also build a vanilla kernel using
config-6.6-rc7-ok with CONFIG_HZ changed from 250 to 1000, and boot that kernel like
a bare kernel command line shown below.

Trying

/usr/libexec/qemu-kvm -m 4096 -smp 8 -nographic -append 'console=ttyS0,115200n8 panic=1' -no-reboot -kernel /boot/vmlinuz-6.6.0-rc7+

using qemu-kvm 1.5.3-175.el7_9.6.x86_64 on kernel 3.10.0-1160.102.1.el7.x86_64
on a physical host PC cannot reproduce this problem.

But trying

/usr/bin/qemu-system-x86_64 -m 4096 -smp 8 -nographic -append 'console=ttyS0,115200n8 panic=1' -no-reboot -kernel /boot/vmlinuz-6.6.0-rc7+

using qemu-system-x86 1:6.2+dfsg-2ubuntu6.15 on kernel 5.15.0-87-generic
on VirtualBox on Windows 11 reproduces similar slowdown (and 5.15.0-87-generic
kernel sometimes emits soft lockup messages).

Thus, someone might be able to reproduce this problem on a nested virtualization
environment.