Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
From: David Rientjes
Date: Thu Jan 09 2025 - 12:59:23 EST
On Tue, 7 Jan 2025, Josh Don wrote:
> > Right, I think this should be entirely up to what the admin configures in
> > debugfs. If they elect to disable latency_warn_once, we'll simply emit
> > the information as often as they specify in latency_warn_ms and not add
> > our own ratelimiting on top. If they have a preference for lots of
> > logging, so be it, let's not hide that data.
>
> Your change doesn't reset rq->last_seen_need_resched_ns, so now
> without the ratelimit I think we'll get a dump every single tick until
> we eventually reschedule.
>
> Another potential benefit to the ratelimit is that if we have
> something wedging multiple cpus concurrently, we don't spam the log
> (if warn_once is disabled). Though, probably an unlikely occurrence.
>
> I think if you modify the patch to reset last_seen_need_resched_ns
> that'll give the behavior you're after.
>
Thanks Josh for pointing this out! I'm surprised by the implementation
here where, even though it's only CONFIG_SCHED_DEBUG, we'd be taking the
function call every tick only to find that the ratelimit makes it a no-op
:/
Is that worth improving as well?
Otherwise, please take a look, is this what you had in mind?
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5659,8 +5659,10 @@ void sched_tick(void)
rq_unlock(rq, &rf);
- if (sched_feat(LATENCY_WARN) && resched_latency)
+ if (sched_feat(LATENCY_WARN) && resched_latency) {
resched_latency_warn(cpu, resched_latency);
+ rq->last_seen_need_resched_ns = 0;
+ }
perf_event_task_tick();
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
void resched_latency_warn(int cpu, u64 latency)
{
- static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
-
- if (likely(!__ratelimit(&latency_check_ratelimit)))
- return;
-
pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
cpu, latency, cpu_rq(cpu)->ticks_without_resched);
dump_stack();