Re: [PATCH] sched: Further restrict the preemption modes
From: Shrikanth Hegde
Date: Mon Mar 09 2026 - 05:16:43 EST
On 2/27/26 8:58 PM, Shrikanth Hegde wrote:
Hi Steve.
On 2/27/26 8:23 PM, Steven Rostedt wrote:
On Fri, 27 Feb 2026 14:39:42 +0530
Shrikanth Hegde <sshegde@xxxxxxxxxxxxx> wrote:
I am afraid we will have trace all functions to begin with (which is expensive), but filter
out those which took minimal time (like less than a 1s or so). that would eventually leave only a
few functions that actually took more than 1s(that should have limited overhead).
It is possible to remove tracing for some function after they were enabled
in kernel? or this could only be done from user by looking at trace buffer?
Even if it doable, This would allow us to trace functions that took a lot of time.
But we should be aiming to calculate the kernel paths that took a lot of time?
Well, I think the detection can be done with timings between schedules.
What's the longest running task without any voluntary schedule. Then you
can add function graph tracing to it where it can possibly trigger in the
location that detected the issue.
This would not work either. We will have sched in/sched out even when
running in userspace.
Lets say, user makes a syscall, the process will continue to in R state,
we only need to track the long running time in kernel, but not in userspace.
On a detection of a long schedule, a stack trace can be recorded. Using
that stack trace, you could use the function graph tracer to see what is
happening.
Anyway, something to think about, and this could be a topic at this years
Linux Plumbers Tracing MC ;-)
We could track the kernel paths, i.e different entry/exit points into kernel.
1. syscall entry/exit.
2. irq entry/exit.
3. kworker threads.
For 1 and 2 we have tracepoints already. For 3, we can use sched in/sched out tracepoints
to see if and when it takes a long time.
All of them could be combined in one bpf program. Any thoughts?
Getting stacktrace of 3 is doable i guess, i.e when sched_out happen while in R state and time
check has failed. But for 1,2 getting a stack is going to be difficult.
Please add if i have missed more kernel paths where we want detection to happen.