Re: [patch V3 00/20] sched: Rewrite MM CID management
From: Shrikanth Hegde
Date: Thu Oct 30 2025 - 01:01:10 EST
Hi Thomas.
On 10/29/25 6:38 PM, Thomas Gleixner wrote:
This is a follow up on V2 series which can be found here:
https://lore.kernel.org/20251022104005.907410538@xxxxxxxxxxxxx
The V1 cover letter contains a detailed analyisis of the issues:
https://lore.kernel.org/20251015164952.694882104@xxxxxxxxxxxxx
TLDR: The CID management is way to complex and adds significant overhead
into scheduler hotpaths.
The series rewrites MM CID management in a more simplistic way which
focusses on low overhead in the scheduler while maintaining per task CIDs
as long as the number of threads is not exceeding the number of possible
CPUs.
The series is based on the V6 series of the rseq rewrite:
https://lore.kernel.org/20251027084220.785525188@xxxxxxxxxxxxx
which is also available from git:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/rseq
The series on top of the tip core/rseq branch is available from git as
well:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/cid
Changes vs. V2:
- Rename to cpumask/bitmap_weighted_or() - Yury
- Zero the bitmap with length of bitmap_size(nr_possible_cpus()) -
Shrikanth
- Move cpu_relax() out of for() as that fails to build when cpu_relax()
is a macro. - Shrikanth
- Picked up Reviewed/Acked-by tags where appropriate
Thanks,
tglx
---
Thomas Gleixner (20):
sched/mmcid: Revert the complex CID management
sched/mmcid: Use proper data structures
sched/mmcid: Cacheline align MM CID storage
sched: Fixup whitespace damage
sched/mmcid: Move scheduler code out of global header
sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
cpumask: Introduce cpumask_weighted_or()
sched/mmcid: Use cpumask_weighted_or()
cpumask: Cache num_possible_cpus()
sched/mmcid: Convert mm CID mask to a bitmap
signal: Move MMCID exit out of sighand lock
sched/mmcid: Move initialization out of line
sched/mmcid: Provide precomputed maximal value
sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
sched/mmcid: Introduce per task/CPU ownership infrastrcuture
sched/mmcid: Provide new scheduler CID mechanism
sched/mmcid: Provide CID ownership mode fixup functions
irqwork: Move data struct to a types header
sched/mmcid: Implement deferred mode change
sched/mmcid: Switch over to the new mechanism
include/linux/bitmap.h | 15
include/linux/cpumask.h | 26 +
include/linux/irq_work.h | 9
include/linux/irq_work_types.h | 14
include/linux/mm_types.h | 125 ------
include/linux/rseq.h | 27 -
include/linux/rseq_types.h | 71 +++
include/linux/sched.h | 19
init/init_task.c | 3
kernel/cpu.c | 15
kernel/exit.c | 1
kernel/fork.c | 7
kernel/sched/core.c | 815 +++++++++++++++++++----------------------
kernel/sched/sched.h | 395 ++++++++-----------
kernel/signal.c | 2
lib/bitmap.c | 6
16 files changed, 727 insertions(+), 823 deletions(-)
I am running into crash at boot on power10 pseries.
Thought of putting it here first. Me trying to figure out why.
I am using your tree.
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git
commit 789ff6e7cc5aa423473eb135f94812fe77b8aeab (HEAD -> rseq/cid, origin/rseq/cid)
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Tue Oct 14 10:51:04 2025 +0200
sched/mmcid: Switch over to the new mechanism
Oops: Kernel access of bad area, sig: 7 [#3]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=8192 NUMA pSeries
Modules linked in: drm drm_panel_orientation_quirks xfs sd_mod sg ibmvscsi ibmveth scsi_transport_srp pseries_wdt dm_mirror dm_region_hash dm_log dm_mod fuse
CPU: 96 UID: 0 PID: 0 Comm: swapper/96 Tainted: G D W 6.18.0-rc3+ #4 PREEMPT(lazy)
Tainted: [D]=DIE, [W]=WARN
NIP [c0000000001b5c10] mm_cid_switch_to+0x58/0x52c
LR [c000000001117c84] __schedule+0x4bc/0x760
Call Trace:
[c00000668367fde0] [c0000000001b53c8] __pick_next_task+0x60/0x2ac (unreliable)
[c00000668367fe40] [c000000001117a14] __schedule+0x24c/0x760
[c00000668367fee0] [c0000000011183d0] schedule_idle+0x3c/0x64
[c00000668367ff10] [c0000000001f2470] do_idle+0x15c/0x1ac
[c00000668367ff60] [c0000000001f2788] cpu_startup_entry+0x4c/0x50
[c00000668367ff90] [c00000000005ef20] start_secondary+0x284/0x288
[c00000668367ffe0] [c00000000000e158] start_secondary_prolog+0x10/0x14