Re: [patch V3 00/12] rseq: Implement time slice extension mechanism
From: Mathieu Desnoyers
Date: Wed Nov 12 2025 - 15:40:18 EST
On 2025-11-12 01:30, Prakash Sangappa wrote:
[...]
The problem reproduces on a 2 socket AMD(384 cpus) bare metal system.
It occurs soon after system boot up. Does not reproduce on a 64cpu VM.
Managed to grep the ‘mksquashfs’ command that was executing, which triggers the panic.
#ps -ef |grep mksquash.
root 16614 10829 0 05:55 ? 00:00:00 mksquashfs /dev/null /var/tmp/dracut.iLs0z0/.squash-test.img -no-progress -comp xz
[...]
..
[ 65.143712] cid bitmask ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff 7fffffffffffffff
[ 65.143767] pid 16614, exec mksquashfs, maxcids 175 percpu 0 pcputhr 0, users 140 nrcpus_allwd 384
[ 65.143769] cid bitmask ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
It's weird that the cid bitmask is all f values (all 1). Aren't those
zeroed on mm init ?
Followed by the panic.[...]
[ 99.979256] watchdog: CPU114: Watchdog detected hard LOCKUP on cpu 114
..
As you can see, at least when it cannot find available cid’s it is in per-task mm cid mode.
Perhaps it is taking longer to drop used cid’s? I have not delved into the mm cid management.
Hopeful you can make out something from the above trace.
Let me know if you want me to add more tracing.
How soon is that after boot up ?
I'm starting to wonder if the num_possible_cpus() value used in
mm_cid_size() and mm_init_cid used respectively for mm allocation
and initialization may be read before it is initialized by the boot up
sequence ?
That's far fetched, but it would be good if we can double-check that
those are never called before the last call to init_cpu_possible and
set_cpu_possible().
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com