Re: [Question] Sched: Severe scheduling latency (>10s) observed on kernel 6.12 with specific workload

From: Xuewen Yan

Date: Wed Apr 01 2026 - 02:06:14 EST

On Wed, Apr 1, 2026 at 12:25 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
>
> On Tue, Mar 31, 2026 at 7:32 PM Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
> >
> > Dear Linux maintainers and reviewers,
> >
> > I am writing to report a severe scheduling latency issue we recently
> > discovered on Linux Kernel 6.12.
> >
> > Issue Description
> >
> > We observed that when running a specific background workload pattern,
> > certain tasks experience excessive scheduling latency. The delay from
> > the runnable state to running on the CPU exceeds 10 seconds, and in
> > extreme cases, it reaches up to 100 seconds.
> >
> > Environment Details
> >
> > Kernel Version: 6.12.58-android16-6-g3835fd28159d-ab000018-4k
> > Architecture: [ ARM64]
> > Hardware: T7300
> > Config: gki_defconfig
> >
> > RT-app‘s workload Pattern:
> >
> > {
> > "tasks" : {
> > "t0" : {
> > "instance" : 40,
> > "priority" : 0,
> > "cpus" : [ 0, 1, 2, 3 ],
> > "taskgroup" : "/background",
> > "loop" : -1,
> > "run" : 200,
> > "sleep" : 50
> > }
> > }
> > }
> >
> > And we have applied the following patchs:
> >
> > https://lore.kernel.org/all/20251216111321.966709786@xxxxxxxxxxxxxxxxxxx/
> > https://lore.kernel.org/all/20260106170509.413636243@xxxxxxxxxxxxxxxxxxx/
> > https://lore.kernel.org/all/20260323134533.805879358@xxxxxxxxxxxxxxxxxxx/
> >
> >
> > Could you please advise if there are known changes in the eevdf in
> > 6.12 that might affect this specific workload pattern?
> >
>
Thanks for the quick response！

> Could you maybe instead point to some source for the runqslower binary
> you attached? I don't think folks will run random binaries.

We use the code in kernel "tools/bpf/runqslower".

>
> Also, it looks like the RT-app description uses the background cgroup,
> can you share the cgroup configuration you have set for that?

Our "background" cgroup does not have any special configurations applied.

cpu.shares: Set to 1024, which is consistent with other cgroups on the system.
Bandwidth Control: It is disabled (no cpu.cfs_quota_us limits set).

>
> Also, did you try to reproduce this against vanilla 6.12-stable ? I'm
> not sure the audience here is going to pay much attention to GKI based
> reports. Were you using any vendorhooks?

We have verified this on a GKI kernel with all vendor hooks removed.
The issue still reproduces in this environment. This suggests the
problem is not directly caused by our vendor-specific modifications.

We conducted an experiment by disabling the DELAY_DEQUEUE feature.
After turning it off, we observed a significant increase in threads
with extremely long runnable times. Even kworkers started exhibiting
timeout phenomena.

Thanks!

---
xuewen