Re: [RFC PATCH v2 0/7] Defer throttle when task exits to user
From: K Prateek Nayak
Date: Tue Apr 15 2025 - 04:45:23 EST
(+ Sebastian)
Hello Jan,
On 4/15/2025 11:39 AM, Jan Kiszka wrote:
Attached the bits with which we succeeded, sometimes. Setup: Debian 12,
RT kernel, 2-4 cores VM, 1-5 instances of the test, 2 min - 2 h
patience. As we have to succeed with at least 3 race conditions in a
row, that is still not bad... But maybe someone has an idea how to
increase probabilities further.
Looking at run.sh, there are only fair tasks with one of them being run
with cfs bandwidth constraints. Are you saying something goes wrong on
PREEMPT_RT as a result of using bandwidth control on fair tasks?
Yes, exactly. Also our in-field workload that triggers (most likely)
this issue is not using RT tasks itself. Only kernel threads are RT here.
What exactly is the symptom you are observing? Does one of the assert()
trip during the run? Do you see a stall logged on dmesg? Can you provide
more information on what to expect in this 2min - 2hr window?
I've just lost my traces from yesterday ("you have 0 minutes to find a
power adapter"), but I got nice RCU stall warnings in the VM, including
backtraces from the involved tasks (minus the read-lock holder IIRC).
Maybe Florian can drop one of his dumps.
So I ran your reproducer on a 2vCPU VM running v6.15-rc1 PREEMPT_RT
and I saw:
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 0-...!: (15000 ticks this GP) idle=8a74/0/0x1 softirq=0/0 fqs=0
rcu: (t=15001 jiffies g=12713 q=24 ncpus=2)
rcu: rcu_preempt kthread timer wakeup didn't happen for 15000 jiffies! g12713 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu: Possible timer handling issue on cpu=0 timer-softirq=17688
rcu: rcu_preempt kthread starved for 15001 jiffies! g12713 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:I stack:0 pid:17 tgid:17 ppid:2 task_flags:0x208040 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x401/0x15a0
? srso_alias_return_thunk+0x5/0xfbef5
? lock_timer_base+0x77/0xb0
? srso_alias_return_thunk+0x5/0xfbef5
? __pfx_rcu_gp_kthread+0x10/0x10
schedule+0x27/0xd0
schedule_timeout+0x76/0x100
? __pfx_process_timeout+0x10/0x10
rcu_gp_fqs_loop+0x10a/0x4b0
rcu_gp_kthread+0xd3/0x160
kthread+0xff/0x210
? rt_spin_lock+0x3c/0xc0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.15.0-rc1-test-dirty #746 PREEMPT_{RT,(full)}
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:pv_native_safe_halt+0xf/0x20
Code: 22 df e9 1f 08 e5 fe 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 85 96 15 00 fb f4 <e9> f7 07 e5 fe 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
RSP: 0018:ffffffff95803e50 EFLAGS: 00000216
RAX: ffff8e2d61534000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000081f8a6c
RBP: ffffffff9581d280 R08: 0000000000000000 R09: ffff8e2cf7d32301
R10: ffff8e2be11ae5c8 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000147b0
FS: 0000000000000000(0000) GS:ffff8e2d61534000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e77c3a5128 CR3: 000000010ff78003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
default_idle+0x9/0x20
default_idle_call+0x30/0x100
do_idle+0x20f/0x250
? do_idle+0xb/0x250
cpu_startup_entry+0x29/0x30
rest_init+0xde/0x100
start_kernel+0x733/0xb20
? copy_bootdata+0x9/0xb0
x86_64_start_reservations+0x18/0x30
x86_64_start_kernel+0xba/0x110
common_startup_64+0x13e/0x141
</TASK>
Is this in line with what you are seeing?
Additionally, do you have RT throttling enabled in your setup? Can long
running RT tasks starve fair tasks on your setup?
RT throttling is enabled (default settings) but was not kicking in - why
should it in that scenario? The only RT thread, ktimers, ran into the
held lock and stopped.
Jan
--
Thanks and Regards,
Prateek