Re: [PATCH 00/24] Complete EEVDF

From: Sven Schnelle
Date: Tue Sep 10 2024 - 07:46:51 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> Hi all,
>
> So after much delay this is hopefully the final version of the EEVDF patches.
> They've been sitting in my git tree for ever it seems, and people have been
> testing it and sending fixes.
>
> I've spend the last two days testing and fixing cfs-bandwidth, and as far
> as I know that was the very last issue holding it back.
>
> These patches apply on top of queue.git sched/dl-server, which I plan on merging
> in tip/sched/core once -rc1 drops.
>
> I'm hoping to then merge all this (+- the DVFS clock patch) right before -rc2.
>
>
> Aside from a ton of bug fixes -- thanks all! -- new in this version is:
>
> - split up the huge delay-dequeue patch
> - tested/fixed cfs-bandwidth
> - PLACE_REL_DEADLINE -- preserve the relative deadline when migrating
> - SCHED_BATCH is equivalent to RESPECT_SLICE
> - propagate min_slice up cgroups
> - CLOCK_THREAD_DVFS_ID

I'm seeing crashes/warnings like the following on s390 with linux-next 20240909:

Sometimes the system doesn't manage to print a oops, this one is the best i got:

[ 596.146142] ------------[ cut here ]------------
[ 596.146161] se->sched_delayed
[ 596.146166] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:13131 __set_next_task_fair.part.0+0x350/0x400
[ 596.146179] Modules linked in: [..]
[ 596.146288] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.11.0-rc7-next-20240909 #18
[ 596.146294] Hardware name: IBM 3931 A01 704 (LPAR)
[ 596.146298] Krnl PSW : 0404e00180000000 001a9c2b5eea4ea4 (__set_next_task_fair.part.0+0x354/0x400)
[ 596.146307] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 596.146314] Krnl GPRS: 001c000300000027 001c000300000023 0000000000000011 0000000000000004
[ 596.146319] 0000000000000001 001a9c2b5f1fb118 000000036ef94dd5 0000001b77ca6ea8
[ 596.146323] 001c000000000000 001a9c2b5eec6fc0 0000001b77ca6000 00000000b7334800
[ 596.146328] 0000000000000000 001a9c2b5eefad70 001a9c2b5eea4ea0 001a9bab5ee8f9f8
[ 596.146340] Krnl Code: 001a9c2b5eea4e94: c0200121bbe6 larl %r2,001a9c2b612dc660
[ 596.146340] 001a9c2b5eea4e9a: c0e5fff9e9d3 brasl %r14,001a9c2b5ede2240
[ 596.146340] #001a9c2b5eea4ea0: af000000 mc 0,0
[ 596.146340] >001a9c2b5eea4ea4: a7f4fe83 brc 15,001a9c2b5eea4baa
[ 596.146340] 001a9c2b5eea4ea8: c0e50038ba2c brasl %r14,001a9c2b5f5bc300

[ 596.146558] CPU: 1 UID: 0 PID: 18582 Comm: prctl-sched-cor Tainted: G W 6.11.0-rc7-next-20240909 #18
[ 596.146564] Tainted: [W]=WARN
[ 596.146567] Hardware name: IBM 3931 A01 704 (LPAR)
[ 596.146570] Krnl PSW : 0404e00180000000 001a9c2b5eec2de4 (dequeue_entity+0xe64/0x11f0)
[ 596.146578] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 596.146584] Krnl GPRS: 001c000300000027 001c000300000023 000000000000001a 0000000000000004
[ 596.146589] 0000000000000001 001a9c2b5f1fb118 001a9c2b61be7144 0000000016e6692a
[ 596.146593] 0000000000000001 00000000b7334951 0000000158494800 00000000b7334900
[ 596.146597] 000000000000489e 0000000000000009 001a9c2b5eec2de0 001a9bab75dff760
[ 596.146607] Krnl Code: 001a9c2b5eec2dd4: c0200120cdf6 larl %r2,001a9c2b612dc9c0
[ 596.146607] 001a9c2b5eec2dda: c0e5fff8fa33 brasl %r14,001a9c2b5ede2240
[ 596.146607] #001a9c2b5eec2de0: af000000 mc 0,0
[ 596.146607] >001a9c2b5eec2de4: c004fffff90a brcl 0,001a9c2b5eec1ff8
[ 596.146607] 001a9c2b5eec2dea: a7f4fbbe brc 15,001a9c2b5eec2566
[ 596.146607] 001a9c2b5eec2dee: a7d10001 tmll %r13,1
[ 596.146607] 001a9c2b5eec2df2: a774fb1c brc 7,001a9c2b5eec242a
[ 596.146607] 001a9c2b5eec2df6: a7f4f95f brc 15,001a9c2b5eec20b4
[ 596.146637] Call Trace:
[ 596.146640] [<001a9c2b5eec2de4>] dequeue_entity+0xe64/0x11f0
[ 596.146645] ([<001a9c2b5eec2de0>] dequeue_entity+0xe60/0x11f0)
[ 596.146650] [<001a9c2b5eec34b0>] dequeue_entities+0x340/0xe10
[ 596.146655] [<001a9c2b5eec4208>] dequeue_task_fair+0xb8/0x5a0
[ 596.146660] [<001a9c2b6115ab68>] __schedule+0xb58/0x14f0
[ 596.146666] [<001a9c2b6115b59c>] schedule+0x9c/0x240
[ 596.146670] [<001a9c2b5edf5190>] do_wait+0x160/0x440
[ 596.146676] [<001a9c2b5edf5936>] kernel_waitid+0xd6/0x110
[ 596.146680] [<001a9c2b5edf5b4e>] __do_sys_waitid+0x1de/0x1f0
[ 596.146685] [<001a9c2b5edf5c36>] __s390x_sys_waitid+0xd6/0x120
[ 596.146690] [<001a9c2b5ed0cbd6>] do_syscall+0x2f6/0x430
[ 596.146695] [<001a9c2b611543a4>] __do_syscall+0xa4/0x170
[ 596.146700] [<001a9c2b6117046c>] system_call+0x74/0x98
[ 596.146705] Last Breaking-Event-Address:
[ 596.146707] [<001a9c2b5ede2418>] __warn_printk+0x1d8/0x1e0

This happens when running the strace test suite. The system normaly has
128 CPUs. With this configuration the crash doesn't happen, but when
disabling all but four CPUs and running 'make check -j16' in the strace
test suite the crash is almost always reproducable.

Regards
Sven