Re: Stalls when starting a VSOCK listening socket: soft lockups, RCU stalls, timeout
From: Matthieu Baerts
Date: Fri Mar 06 2026 - 06:07:30 EST
Hi Thomas,
Thank you for looking into this!
On 06/03/2026 10:57, Thomas Gleixner wrote:
> On Fri, Mar 06 2026 at 06:48, Jiri Slaby wrote:
>> On 05. 03. 26, 20:25, Thomas Gleixner wrote:
>>> Is there simple way to reproduce?
>>
>> Unfortunately not at all. To date, I even cannot reproduce locally, it
>> reproduces exclusively in opensuse build service (and github CI as per
>> Matthieu's report). I have a project in there with packages which fail
>> more often than others:
>> https://build.opensuse.org/project/monitor/home:jirislaby:softlockup
>> But it's all green ATM.
>>
>> Builds of Go 1.24 and tests of rust 1.90 fail the most. The former even
>> takes only ~ 8 minutes, so it's not that intensive build at all. So the
>> reasons are unknown to me. At least, Go apparently uses threads for
>> building (unlike gcc/clang with forks/processes). Dunno about rust.
>
> I tried with tons of test cases which stress test mmcid with threads and
> failed.
On my side, I didn't manage to reproduce it locally either.
> Can you provide me your .config, source version, VM setup (Number of
> CPUs, memory etc.)?
My CI ran into this issue 2 days ago, with and without a debug kernel
config. The kernel being tested was on top of 'net-next', which was on
top of this commit from Linus' tree: fbdfa8da05b6 ("selftests:
tc-testing: fix list_categories() crash on list type").
- Config without debug:
https://github.com/user-attachments/files/25791728/config-run-22657946888-normal-join.gz
- Config with debug:
https://github.com/user-attachments/files/25791960/config-run-22657946888-debug-nojoin.gz
- Just in case, stacktraces available there:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22657946888
My tests are being executed in VMs I don't control using a kernel v6.14
on Azure with 4 vCPUs, 16GB of RAM, and KVM nested support. From more
details about what's in it:
https://github.com/actions/runner-images/blob/ubuntu24/20260302.42/images/ubuntu/Ubuntu2404-Readme.md
>From there, a docker container is started, from which QEMU 10.1.0
(Debian 1:10.1.0+ds-5ubuntu2.2) is launched with 4 vCPU and 5GB of RAM
using this command:
/usr/bin/qemu-system-x86_64 \
-name mptcpdev \
-m 5120M \
-smp 4 \
-chardev socket,id=charvirtfs5,path=/tmp/virtmevrwrzu5k \
-device vhost-user-fs-device,chardev=charvirtfs5,tag=ROOTFS \
-object memory-backend-memfd,id=mem,size=5120M,share=on \
-numa node,memdev=mem \
-machine accel=kvm:tcg \
-M microvm,accel=kvm,pcie=on,rtc=on \
-cpu host,topoext=on \
-parallel none \
-net none \
-echr 1 \
-chardev file,path=/proc/self/fd/2,id=dmesg \
-device virtio-serial-device \
-device virtconsole,chardev=dmesg \
-chardev stdio,id=console,signal=off,mux=on \
-serial chardev:console \
-mon chardev=console \
-vga none \
-display none \
-device vhost-vsock-device,guest-cid=3 \
-kernel
/home/runner/work/mptcp_net-next/mptcp_net-next/.virtme/build/arch/x86/boot/bzImage
\
-append 'virtme_hostname=mptcpdev nr_open=1048576
virtme_link_mods=/home/runner/work/mptcp_net-next/mptcp_net-next/.virtme/build/.virtme_mods/lib/modules/0.0.0
virtme_rw_overlay0=/tmp console=hvc0 earlyprintk=serial,ttyS0,115200
virtme_console=ttyS0 psmouse.proto=exps
virtme.vsockexec=`/tmp/virtme-console/3.sh`
virtme_chdir=home/runner/work/mptcp_net-next/mptcp_net-next
virtme_root_user=1 rootfstype=virtiofs root=ROOTFS raid=noautodetect rw
debug nokaslr mitigations=off softlockup_panic=1 nmi_watchdog=1
hung_task_panic=1 panic=-1 oops=panic
init=/usr/local/lib/python3.13/dist-packages/virtme/guest/bin/virtme-ng-init'
\
-gdb tcp::1234 \
-qmp tcp::3636,server,nowait \
-no-reboot
It is possible to locally launch the same command using the same QEMU
version (but not the same host kernel) with the help of Docker:
$ cd <kernel source code>
# docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm \
-it --privileged mptcp/mptcp-upstream-virtme-docker:latest \
manual normal
This will build a new kernel in O=.virtme/build, launch it and give you
access to a prompt.
After that, you can do also use the "auto" mode with the last built
image to boot the VM, only print "OK", stop and retry if there were no
errors:
$ cd <kernel source code>
$ echo 'echo OK' > .virtme-exec-run
# i=1; \
while docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm \
-it --privileged mptcp/mptcp-upstream-virtme-docker:latest \
vm auto normal; do \
echo "== Attempt: $i: OK =="; \
i=$((i+1)); \
done; \
echo "== Failure after $i attempts =="
> I tried to find it on that github page Matthiue mentioned but I'm
> probably too stupid to navigate this clicky interface.
I'm sorry about that, I understand, the interface is not very clear. Do
not hesitate to tell me if you need anything else from me.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.