Re: [patch V3 00/20] sched: Rewrite MM CID management

From: Shrikanth Hegde

Date: Thu Oct 30 2025 - 01:01:10 EST


Hi Thomas.

On 10/29/25 6:38 PM, Thomas Gleixner wrote:
This is a follow up on V2 series which can be found here:

https://lore.kernel.org/20251022104005.907410538@xxxxxxxxxxxxx

The V1 cover letter contains a detailed analyisis of the issues:

https://lore.kernel.org/20251015164952.694882104@xxxxxxxxxxxxx

TLDR: The CID management is way to complex and adds significant overhead
into scheduler hotpaths.

The series rewrites MM CID management in a more simplistic way which
focusses on low overhead in the scheduler while maintaining per task CIDs
as long as the number of threads is not exceeding the number of possible
CPUs.

The series is based on the V6 series of the rseq rewrite:

https://lore.kernel.org/20251027084220.785525188@xxxxxxxxxxxxx

which is also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/rseq

The series on top of the tip core/rseq branch is available from git as
well:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/cid

Changes vs. V2:

- Rename to cpumask/bitmap_weighted_or() - Yury

- Zero the bitmap with length of bitmap_size(nr_possible_cpus()) -
Shrikanth
- Move cpu_relax() out of for() as that fails to build when cpu_relax()
is a macro. - Shrikanth

- Picked up Reviewed/Acked-by tags where appropriate

Thanks,

tglx
---
Thomas Gleixner (20):
sched/mmcid: Revert the complex CID management
sched/mmcid: Use proper data structures
sched/mmcid: Cacheline align MM CID storage
sched: Fixup whitespace damage
sched/mmcid: Move scheduler code out of global header
sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
cpumask: Introduce cpumask_weighted_or()
sched/mmcid: Use cpumask_weighted_or()
cpumask: Cache num_possible_cpus()
sched/mmcid: Convert mm CID mask to a bitmap
signal: Move MMCID exit out of sighand lock
sched/mmcid: Move initialization out of line
sched/mmcid: Provide precomputed maximal value
sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
sched/mmcid: Introduce per task/CPU ownership infrastrcuture
sched/mmcid: Provide new scheduler CID mechanism
sched/mmcid: Provide CID ownership mode fixup functions
irqwork: Move data struct to a types header
sched/mmcid: Implement deferred mode change
sched/mmcid: Switch over to the new mechanism

include/linux/bitmap.h | 15
include/linux/cpumask.h | 26 +
include/linux/irq_work.h | 9
include/linux/irq_work_types.h | 14
include/linux/mm_types.h | 125 ------
include/linux/rseq.h | 27 -
include/linux/rseq_types.h | 71 +++
include/linux/sched.h | 19
init/init_task.c | 3
kernel/cpu.c | 15
kernel/exit.c | 1
kernel/fork.c | 7
kernel/sched/core.c | 815 +++++++++++++++++++----------------------
kernel/sched/sched.h | 395 ++++++++-----------
kernel/signal.c | 2
lib/bitmap.c | 6
16 files changed, 727 insertions(+), 823 deletions(-)



I am running into crash at boot on power10 pseries.
Thought of putting it here first. Me trying to figure out why.

I am using your tree.
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git

commit 789ff6e7cc5aa423473eb135f94812fe77b8aeab (HEAD -> rseq/cid, origin/rseq/cid)
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Tue Oct 14 10:51:04 2025 +0200

sched/mmcid: Switch over to the new mechanism


Oops: Kernel access of bad area, sig: 7 [#3]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
Modules linked in: drm drm_panel_orientation_quirks xfs sd_mod sg ibmvscsi ibmveth scsi_transport_srp pseries_wdt dm_mirror dm_region_hash dm_log dm_mod fuse
CPU: 96 UID: 0 PID: 0 Comm: swapper/96 Tainted: G D W 6.18.0-rc3+ #4 PREEMPT(lazy)
Tainted: [D]=DIE, [W]=WARN
NIP [c0000000001b5c10] mm_cid_switch_to+0x58/0x52c
LR [c000000001117c84] __schedule+0x4bc/0x760
Call Trace:
[c00000668367fde0] [c0000000001b53c8] __pick_next_task+0x60/0x2ac (unreliable)
[c00000668367fe40] [c000000001117a14] __schedule+0x24c/0x760
[c00000668367fee0] [c0000000011183d0] schedule_idle+0x3c/0x64
[c00000668367ff10] [c0000000001f2470] do_idle+0x15c/0x1ac
[c00000668367ff60] [c0000000001f2788] cpu_startup_entry+0x4c/0x50
[c00000668367ff90] [c00000000005ef20] start_secondary+0x284/0x288
[c00000668367ffe0] [c00000000000e158] start_secondary_prolog+0x10/0x14