Re: [patch V3 00/12] rseq: Implement time slice extension mechanism

From: Mathieu Desnoyers

Date: Mon Nov 10 2025 - 12:15:42 EST


On 2025-11-10 09:23, Mathieu Desnoyers wrote:
On 2025-11-06 12:28, Prakash Sangappa wrote:
[...]
Hit this watchdog panic.

Using following tree. Assume this Is the latest.
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/ rseq/ slice

Appears to be spinning in mm_get_cid(). Must be the mm cid changes.
https://lore.kernel.org/all/20251029123717.886619142@xxxxxxxxxxxxx/

When this happened during the development of the "complex" mm_cid
scheme, this was typically caused by a stale "mm_cid" being kept around
by a task even though it was not actually scheduled, thus causing
over-reservation of concurrency IDs beyond the max_cids threshold. This
ends up looping in:

static inline unsigned int mm_get_cid(struct mm_struct *mm)
{
        unsigned int cid = __mm_get_cid(mm, READ_ONCE(mm- >mm_cid.max_cids));

        while (cid == MM_CID_UNSET) {
                cpu_relax();
                cid = __mm_get_cid(mm, num_possible_cpus());
        }
        return cid;
}

Based on the stacktrace you provided, it seems to happen within
sched_mm_cid_fork() within copy_process, so perhaps it's simply an
initialization issue in fork, or an issue when cloning a new thread ?

One possible issue here: I note that kernel/sched/core.c:mm_init_cid()
misses the following initialization:

mm->mm_cid.transit = 0;

Thanks,

Mathieu



Thanks,

Mathieu



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com