Re: [patch 0/4] sched/mmcid: Cure fork()/vfork() related problems

Next message: syzbot: "Re: [syzbot] [mm?] [f2fs?] [exfat?] memory leak in __kfree_rcu_sheaf"
Previous message: Thomas Wei&#xDF;schuh (Schneider Electric): "[PATCH 12/12] hrtimer: Add a helper to retrieve a hrtimer from its timerqueue node"
In reply to: tip-bot2 for Thomas Gleixner: "[tip: sched/urgent] sched/mmcid: Avoid full tasklist walks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Matthieu Baerts

Date: Wed Mar 11 2026 - 06:20:24 EST

Hi Thomas,

On 10/03/2026 21:28, Thomas Gleixner wrote:
> Matthiue and Jiri reported CPU stalls where a CPU git stuck in mm_get_cid():
>
> https://lore.kernel.org/b24ffcb3-09d5-4e48-9070-0b69bc654281@xxxxxxxxxx
>
> After some tedious debugging it turned out to be another subtle (or not so
> subtle) ownership mode change issue.
>
> The logic handling vfork()'ed tasks in sched_mmcid_fixup_tasks_to_cpus() is
> broken. It is invoked when the number of tasks associated to a process is
> smaller than the number of MMCID users. It then walks the task list to find
> the vfork()'ed task, but accounts all the already processed tasks as well.
>
> If that double processing brings the number of to be handled tasks to 0,
> the walk stops and the vfork()'ed task's CID is not fixed up. As a
> consequence a subsequent schedule in fails to acquire a (transitional) CID
> and the machine stalls.
>
> Peter and me discovered also that there is a yet unreported issue
> vs. concurrent forks. Jiri noticed it independently.
>
> The following series fixes those issues. It applies on top of Linus tree.

Thank you for this series!

My CI also complained about the missing "#ifdef CONFIG_SCHED_MM_CID"
with "make tinyconfig", but for the rest, I didn't have any issues to
boot 200 times! Just in case you need this tag:

Tested-by: Matthieu Baerts (NGI0) <matttbe@xxxxxxxxxx>

Cheers,
Matt
--
Sponsored by the NGI0 Core fund.