Re: [tip: sched/core] sched/topology: Compute sd_weight considering cpuset partitions
From: Chen, Yu C
Date: Sat Mar 21 2026 - 03:33:40 EST
On 3/21/2026 11:36 AM, K Prateek Nayak wrote:
Hello Nathan,
Thank you for the report.
On 3/21/2026 5:28 AM, Nathan Chancellor wrote:
$ cat kernel/configs/schedstats.config
CONFIG_SCHEDSTATS=y
Is the "schedstats.config" available somewhere? I tried these
steps on my end but couldn't reproduce the crash with my config.
Also, are you saying it is necessary to enable CONFIG_SCHEDSTATS
to observe the crash?
$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- mrproper defconfig schedstats.config zImage
$ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/arm-rootfs.cpio.zst | zstd -d >rootfs.cpio
$ qemu-system-arm \
-display none \
-nodefaults \
-no-reboot \
-machine virt \
-append 'console=ttyAMA0 earlycon' \
-kernel arch/arm/boot/zImage \
-initrd rootfs.cpio \
-m 1G \
-serial mon:stdio
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 7.0.0-rc4-00017-g8e8e23dea43e (nathan@framework-amd-ryzen-maxplus-395) (arm-linux-gnueabi-gcc (GCC) 15.2.0, GNU ld (GNU Binutils) 2.45) #1 SMP Fri Mar 20 16:12:05 MST 2026
...
[ 0.031929] 8<--- cut here ---
[ 0.031999] Unable to handle kernel NULL pointer dereference at virtual address 00000000 when write
[ 0.032172] [00000000] *pgd=00000000
[ 0.032459] Internal error: Oops: 805 [#1] SMP ARM
[ 0.032902] Modules linked in:
[ 0.033466] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-00017-g8e8e23dea43e #1 VOLUNTARY
[ 0.033658] Hardware name: Generic DT based system
[ 0.033770] PC is at build_sched_domains+0x7d0/0x1628
For me, this points to:
$ scripts/faddr2line vmlinux build_sched_domains+0x7d0/0x1628
I suppose we might need to use arm-linux-gnueabi-addr2line, just
in case of miss-match.
build_sched_domains+0x7d0/0x1628:
find_next_bit_wrap at include/linux/find.h:455
(inlined by) build_sched_groups at kernel/sched/topology.c:1255
(inlined by) build_sched_domains at kernel/sched/topology.c:2603
which is the:
span = sched_domain_span(sd);
for_each_cpu_wrap(i, span, cpu) /* Here */ {
...
}
in build_sched_groups() so we are likely going off the allocated
cpumask size but before that, we do this in the caller:
sd->span_weight = cpumask_weight(sched_domain_span(sd));
which should have crashed too if we had a NULL pointer in the
cpumask range. So I'm at a loss. Maybe the pc points to a
different location in your build?
A wild guess, the major change is that we access sd->span, before
initializing the sd structure with *sd = { ... }. The sd is allocated
via alloc_percpu() uninitialized, the span at the end of the sd structure
remain uninitialized. It is unclear how cpumask_weight(sd->span) might be
affected by this uninitialized state. Before this patch, after *sd = { ... }
is executed, the contents of sd->span are explicitly set to 0, which might
be safer?
Thanks,
Chenyu