Re: [PATCH] sched: Further restrict the preemption modes

From: Shrikanth Hegde

Date: Fri Feb 27 2026 - 04:11:06 EST

Hi Steven.

On 2/26/26 10:52 PM, Steven Rostedt wrote:

On Thu, 26 Feb 2026 11:00:14 +0530
Shrikanth Hegde <sshegde@xxxxxxxxxxxxx> wrote:

On 2/26/26 6:18 AM, Steven Rostedt wrote:

On Wed, 25 Feb 2026 11:53:45 +0100
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

Oh, that reminds me, Steve, would it make sense to have
task_struct::se.sum_exec_runtime as a trace-clock?

That's unique per task right? As tracing is global it requires the
clock to be monotonic, and I'm guessing a single sched_switch will
break that.

Now if one wants to trace how long kernel paths are, I'm sure we could
trivially make a new tracer to do so.

echo max_kernel_time > current_tracer

That is good idea.

Yeah, I think something like this should be added now that LAZY will
prevent us from knowing where in the kernel is really going on for a long
time.

That would be the goal.

or something like that, that could act like a latency tracer that
monitors how long any kernel thread runs without being preempted.

-- Steve

With preempt=full/lazy a long running kernel task can get
preempted if it is running in preemptible section. that's okay.

My intent was to have a tracer that can say, look this kernel task took this much time
before it completed. For some task such as long page walk, we know it is okay since

Tracers can be set to only watch a single task. The function and function
graph tracers use set_ftrace_pid. I could extend that to other tracers.
Hmm, that may even be useful for the preemptirq tracer!

it is expected to take time, but for some task such as reading watchdog shouldn't take
time. But on large system's doing these global variable update itself may take a long time.
Updating less often was a fix which had fixed that lockup IIRC. So how can we identify such

That was a hardlockup. wrong example.

opportunities. Hopefully I am making sense.

Not really. Can you explain in more detail, or specific examples of what
constitutes a path you want to trace and one that you do not?

All I was saying, there have been fixes which solved softlockup issues
without using cond_resched. But seeing softlockup was important to know
that issue existed.

Some reference commit I think that did this;
a8c861f401b4 xfs: avoid busy loops in GCD
e1b849cfa6b6 writeback: Avoid contention on wb->list_lock when switching inodes
0ddfb62f5d01 fix the softlockups in attach_recursive_mnt()

I am afraid we will have trace all functions to begin with (which is expensive), but filter
out those which took minimal time (like less than a 1s or so). that would eventually leave only a
few functions that actually took more than 1s(that should have limited overhead).