RE: [PATCH 0/4] sched: Various reweight_entity() fixes

From: Doug Smythies

Date: Tue Feb 03 2026 - 11:36:55 EST

Hi All,

On 2026.02.03 04:19 K Prateek Nayak wrote:
> On 2/3/2026 4:41 PM, Peter Zijlstra wrote:
>> On Tue, Feb 03, 2026 at 12:15:56PM +0530, K Prateek Nayak wrote:
>>> On 1/30/2026 3:04 PM, Peter Zijlstra wrote:
>>>> Two issues related to reweight_entity() were raised; poking at all that got me
>>>> these patches.
>>>>
>>>> They're in queue.git/sched/core and I spend most of yesterday staring at traces
>>>> trying to find anything wrong. So far, so good.
>>>>
>>>> Please test.
>>>
>>> I put this on top of tip:sched/urgent + tip:sched/core which contains Ingo's
>>> cleanup of removing the union and at some point in the benchmark run I hit:
>>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000051

... snip ...

> This trips when I'm running a (very) old version of schbench at commit
> e4aa540 ("Make sure rps isn't zero in auto_rps mode.")
>
> I'm running the following on a 512 CPU server:
>
> #!/bin/bash
>
> DIR=$1
> MESSENGERS=1
> MAX_ITERS=2
> SCHBENCH=./schbench
>
> for i in 1 2 4 8 16 32 64 128 256 512 768 1024;
> do
> THISDIR=$DIR/$i-workers
> if [ ! -d $THISDIR ]
> then
> mkdir -p $THISDIR
> fi
> for j in `seq 0 $MAX_ITERS`
> do
> echo "===== Worker $i : Iter $j ======";
> $SCHBENCH -m $MESSENGERS -t $i |& tee $THISDIR/iter-$j.log;
> sleep 2
> done
> done
>
> Fails when it is running with 768 workers. Standalone runs didn't
> fail - have to run a cumulative runner that runs sched-messaging,
> stream, tbench, netperf, first before running schbench :-(

Further to my email from the other day, where all was good [1],
I have continued to test, in particular the severe overload conditions
from [2].

Under heavy overload my test computer just hangs. My multiple
ssh sessions eventually terminate. I have left it for any hours, but
have to reset it in the end.
The first time there were no log entries at all, at least that I could
find.
The second time:
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000051
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: Oops: 0000 [#1] SMP NOPTI
kernel: CPU: 11 UID: 1000 PID: 3597 Comm: yes Not tainted 6.19.0-rc1-pz #1 PREEMPT(full)
...

The entire relevant part is attached.

Conditions:
Greater than 12,500 X (yes > /dev/null) tasks
But less than 15,000 X ( yes > /dev/null) tasks

I have tested up to 20,000 X (yes > /dev/null) tasks
with previous kernels, including mainline 6.19-rc1.

I would not disagree if you say my operating conditions
are ridiculous.

System:
Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz, 6 cores 12 CPUs.
CPU frequency scaling driver: intel_pstate; Governor powersave.
HWP: Enabled

[1] https://lore.kernel.org/lkml/000d01dc939e$0fc99fe0$2f5cdfa0$@telus.net/
[2] https://lore.kernel.org/lkml/002401dbb6bd$4527ec00$cf77c400$@telus.net/

... Doug

Attachment: kern.log
Description: Binary data