Re: [PATCH] sched: Further restrict the preemption modes
From: Shrikanth Hegde
Date: Thu Feb 26 2026 - 00:31:22 EST
On 2/26/26 6:18 AM, Steven Rostedt wrote:
On Wed, 25 Feb 2026 11:53:45 +0100
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
Oh, that reminds me, Steve, would it make sense to have
task_struct::se.sum_exec_runtime as a trace-clock?
That's unique per task right? As tracing is global it requires the
clock to be monotonic, and I'm guessing a single sched_switch will
break that.
Now if one wants to trace how long kernel paths are, I'm sure we could
trivially make a new tracer to do so.
echo max_kernel_time > current_tracer
That is good idea.
or something like that, that could act like a latency tracer that
monitors how long any kernel thread runs without being preempted.
-- Steve
With preempt=full/lazy a long running kernel task can get
preempted if it is running in preemptible section. that's okay.
My intent was to have a tracer that can say, look this kernel task took this much time
before it completed. For some task such as long page walk, we know it is okay since
it is expected to take time, but for some task such as reading watchdog shouldn't take
time. But on large system's doing these global variable update itself may take a long time.
Updating less often was a fix which had fixed that lockup IIRC. So how can we identify such
opportunities. Hopefully I am making sense.
Earlier, one would have got a softlockup when things were making very slow progress(one's
which didn't have a cond_resched)
Now, we don't know unless we see a workload regression.
If we don't have a tracer/mechanism today which gives kernel_tasks > timelimit,
then having a new one would help.