Re: [REGRESSION] Re: [PATCH 00/24] Complete EEVDF

From: Luis Machado
Date: Tue Dec 10 2024 - 03:46:45 EST


On 11/28/24 10:32, Marcel Ziswiler wrote:
> Hi all,
>
> On Sat, 2024-07-27 at 12:27 +0200, Peter Zijlstra wrote:
>> Hi all,
>>
>> So after much delay this is hopefully the final version of the EEVDF patches.
>> They've been sitting in my git tree for ever it seems, and people have been
>> testing it and sending fixes.
>>
>> I've spend the last two days testing and fixing cfs-bandwidth, and as far
>> as I know that was the very last issue holding it back.
>>
>> These patches apply on top of queue.git sched/dl-server, which I plan on merging
>> in tip/sched/core once -rc1 drops.
>>
>> I'm hoping to then merge all this (+- the DVFS clock patch) right before -rc2.
>>
>>
>> Aside from a ton of bug fixes -- thanks all! -- new in this version is:
>>
>>  - split up the huge delay-dequeue patch
>>  - tested/fixed cfs-bandwidth
>>  - PLACE_REL_DEADLINE -- preserve the relative deadline when migrating
>>  - SCHED_BATCH is equivalent to RESPECT_SLICE
>>  - propagate min_slice up cgroups
>>  - CLOCK_THREAD_DVFS_ID
>
> We found the following 7 commits from this patch set to crash in enqueue_dl_entity():
>
> 54a58a787791 sched/fair: Implement DELAY_ZERO
> 152e11f6df29 sched/fair: Implement delayed dequeue
> e1459a50ba31 sched: Teach dequeue_task() about special task states
> a1c446611e31 sched,freezer: Mark TASK_FROZEN special
> 781773e3b680 sched/fair: Implement ENQUEUE_DELAYED
> f12e148892ed sched/fair: Prepare pick_next_task() for delayed dequeue
> 2e0199df252a sched/fair: Prepare exit/cleanup paths for delayed_dequeue
>
> Resulting in the following crash dump (this is running v6.12.1):
>
> [ 14.652856] sched: DL replenish lagged too much
> [ 16.572706] ------------[ cut here ]------------
> [ 16.573115] WARNING: CPU: 5 PID: 912 at kernel/sched/deadline.c:1995 enqueue_dl_entity+0x46c/0x55c
> [ 16.573900] Modules linked in: overlay crct10dif_ce rk805_pwrkey snd_soc_es8316 pwm_fan
> phy_rockchip_naneng_combphy rockchip_saradc rtc_hym8563 industrialio_trigg
> ered_buffer kfifo_buf rockchip_thermal phy_rockchip_usbdp typec spi_rockchip_sfc snd_soc_rockchip_i2s_tdm
> hantro_vpu v4l2_vp9 v4l2_h264 v4l2_jpeg panthor v4l2_mem2me
> m rockchipdrm drm_gpuvm drm_exec drm_shmem_helper analogix_dp gpu_sched dw_mipi_dsi dw_hdmi cec
> drm_display_helper snd_soc_audio_graph_card snd_soc_simple_card_utils
> drm_dma_helper drm_kms_helper cfg80211 rfkill pci_endpoint_test drm backlight dm_mod dax
> [ 16.578350] CPU: 5 UID: 0 PID: 912 Comm: job10 Not tainted 6.12.1-dirty #15
> [ 16.578956] Hardware name: radxa Radxa ROCK 5B/Radxa ROCK 5B, BIOS 2024.10-rc3 10/01/2024
> [ 16.579667] pstate: 204000c9 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 16.580273] pc : enqueue_dl_entity+0x46c/0x55c
> [ 16.580661] lr : dl_server_start+0x44/0x12c
> [ 16.581028] sp : ffff80008002bc00
> [ 16.581318] x29: ffff80008002bc00 x28: dead000000000122 x27: 0000000000000000
> [ 16.581941] x26: 0000000000000001 x25: 0000000000000000 x24: 0000000000000009
> [ 16.582563] x23: ffff33c976db0e40 x22: 0000000000000001 x21: 00000000002dc6c0
> [ 16.583186] x20: 0000000000000001 x19: ffff33c976db17a8 x18: 0000000000000000
> [ 16.583808] x17: ffff5dd9779ac000 x16: ffff800080028000 x15: 11c3485b851e0698
> [ 16.584430] x14: 11b4b257e4156000 x13: 0000000000000255 x12: 0000000000000000
> [ 16.585053] x11: ffff33c976db0ec0 x10: 0000000000000000 x9 : 0000000000000009
> [ 16.585674] x8 : 0000000000000005 x7 : ffff33c976db19a0 x6 : ffff33c78258b440
> [ 16.586296] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
> [ 16.586918] x2 : 0000000000000001 x1 : 0000000000000001 x0 : ffff33c798e112f0
> [ 16.587540] Call trace:
> [ 16.587754] enqueue_dl_entity+0x46c/0x55c
> [ 16.588113] dl_server_start+0x44/0x12c
> [ 16.588449] enqueue_task_fair+0x124/0x49c
> [ 16.588807] enqueue_task+0x3c/0xe0
> [ 16.589113] ttwu_do_activate.isra.0+0x6c/0x208
> [ 16.589511] try_to_wake_up+0x1d0/0x61c
> [ 16.589847] wake_up_process+0x18/0x24
> [ 16.590175] kick_pool+0x84/0x150
> [ 16.590467] __queue_work+0x2f4/0x544
> [ 16.590788] delayed_work_timer_fn+0x1c/0x28
> [ 16.591161] call_timer_fn+0x34/0x1ac
> [ 16.591481] __run_timer_base+0x20c/0x314
> [ 16.591832] run_timer_softirq+0x3c/0x78
> [ 16.592176] handle_softirqs+0x124/0x35c
> [ 16.592520] __do_softirq+0x14/0x20
> [ 16.592827] ____do_softirq+0x10/0x1c
> [ 16.593148] call_on_irq_stack+0x24/0x4c
> [ 16.593490] do_softirq_own_stack+0x1c/0x2c
> [ 16.593857] irq_exit_rcu+0x8c/0xc0
> [ 16.594163] el0_interrupt+0x48/0xbc
> [ 16.594477] __el0_irq_handler_common+0x18/0x24
> [ 16.594874] el0t_64_irq_handler+0x10/0x1c
> [ 16.595232] el0t_64_irq+0x190/0x194
> [ 16.595545] ---[ end trace 0000000000000000 ]---
> [ 16.595950] ------------[ cut here ]------------


Random piece of data, but I also had some difficulty making things boot on Android when trying
Vincent's nr_running accounting series due to a very similar crash/stack trace. Though what I
saw went a bit further and actually crashed within task_contending, called from
enqueue_dl_entity. Possibly crashed in one of the inlined functions.

Even though the kernel was 6.8 and it was backports, it seems awfully similar to the above.

>
> It looks like it is trying to enqueue an already queued deadline task. Full serial console log available [1].
>
> We are running the exact same scheduler stress test both on Intel NUCs as well as RADXA ROCK 5B board farms.
> While so far we have not seen this on amd64 it crashes consistently/reproducible on aarch64.
>
> We haven't had time to do a non-proprietary reproduction case as of yet but I wanted to report our current
> findings asking for any feedback/suggestions from the community.
>
> Thanks!
>
> Cheers
>
> Marcel
>
> [1] https://hastebin.skyra.pw/hoqesigaye.yaml
>