Re: [PATCH] sched: Further restrict the preemption modes

Next message: Naveen N Rao: "Re: [PATCH v2 1/2] KVM: SVM: Fix UBSAN warning when reading avic parameter"
Previous message: David Carlier: "[PATCH 1/2] sched_ext: Use rcu_dereference() for scx_root in dump paths"
In reply to: Steven Rostedt: "Re: [PATCH] sched: Further restrict the preemption modes"
Next in thread: Steven Rostedt: "Re: [PATCH] sched: Further restrict the preemption modes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Shrikanth Hegde

Date: Thu Feb 26 2026 - 00:31:22 EST

On 2/26/26 6:18 AM, Steven Rostedt wrote:

On Wed, 25 Feb 2026 11:53:45 +0100
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

Oh, that reminds me, Steve, would it make sense to have
task_struct::se.sum_exec_runtime as a trace-clock?

That's unique per task right? As tracing is global it requires the
clock to be monotonic, and I'm guessing a single sched_switch will
break that.

Now if one wants to trace how long kernel paths are, I'm sure we could
trivially make a new tracer to do so.

echo max_kernel_time > current_tracer

That is good idea.

or something like that, that could act like a latency tracer that
monitors how long any kernel thread runs without being preempted.

-- Steve

With preempt=full/lazy a long running kernel task can get
preempted if it is running in preemptible section. that's okay.

My intent was to have a tracer that can say, look this kernel task took this much time
before it completed. For some task such as long page walk, we know it is okay since
it is expected to take time, but for some task such as reading watchdog shouldn't take
time. But on large system's doing these global variable update itself may take a long time.
Updating less often was a fix which had fixed that lockup IIRC. So how can we identify such
opportunities. Hopefully I am making sense.

Earlier, one would have got a softlockup when things were making very slow progress(one's
which didn't have a cond_resched)
Now, we don't know unless we see a workload regression.

If we don't have a tracer/mechanism today which gives kernel_tasks > timelimit,
then having a new one would help.