Re: [Question] Sched: Severe scheduling latency (>10s) observed on kernel 6.12 with specific workload

From: Xuewen Yan

Date: Wed Apr 01 2026 - 07:03:21 EST


On Wed, Apr 1, 2026 at 6:05 PM Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Wed, 1 Apr 2026 at 08:04, Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
> >
> > On Wed, Apr 1, 2026 at 12:25 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Mar 31, 2026 at 7:32 PM Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
> > > >
> > > > Dear Linux maintainers and reviewers,
> > > >
> > > > I am writing to report a severe scheduling latency issue we recently
> > > > discovered on Linux Kernel 6.12.
> > > >
> > > > Issue Description
> > > >
> > > > We observed that when running a specific background workload pattern,
> > > > certain tasks experience excessive scheduling latency. The delay from
> > > > the runnable state to running on the CPU exceeds 10 seconds, and in
> > > > extreme cases, it reaches up to 100 seconds.
> > > >
> > > > Environment Details
> > > >
> > > > Kernel Version: 6.12.58-android16-6-g3835fd28159d-ab000018-4k
> > > > Architecture: [ ARM64]
> > > > Hardware: T7300
> > > > Config: gki_defconfig
> > > >
> > > > RT-app‘s workload Pattern:
> > > >
> > > > {
> > > > "tasks" : {
> > > > "t0" : {
> > > > "instance" : 40,
> > > > "priority" : 0,
> > > > "cpus" : [ 0, 1, 2, 3 ],
> > > > "taskgroup" : "/background",
> > > > "loop" : -1,
> > > > "run" : 200,
> > > > "sleep" : 50
> > > > }
> > > > }
> > > > }
> > > >
> > > > And we have applied the following patchs:
> > > >
> > > > https://lore.kernel.org/all/20251216111321.966709786@xxxxxxxxxxxxxxxxxxx/
> > > > https://lore.kernel.org/all/20260106170509.413636243@xxxxxxxxxxxxxxxxxxx/
> > > > https://lore.kernel.org/all/20260323134533.805879358@xxxxxxxxxxxxxxxxxxx/
> > > >
> > > >
> > > > Could you please advise if there are known changes in the eevdf in
> > > > 6.12 that might affect this specific workload pattern?
> > > >
> > >
> > Thanks for the quick response!
> >
> > > Could you maybe instead point to some source for the runqslower binary
> > > you attached? I don't think folks will run random binaries.
> >
> > We use the code in kernel "tools/bpf/runqslower".
> >
> > >
> > > Also, it looks like the RT-app description uses the background cgroup,
> > > can you share the cgroup configuration you have set for that?
> >
> > Our "background" cgroup does not have any special configurations applied.
> >
> > cpu.shares: Set to 1024, which is consistent with other cgroups on the system.
> > Bandwidth Control: It is disabled (no cpu.cfs_quota_us limits set).
> >
> > >
> > > Also, did you try to reproduce this against vanilla 6.12-stable ? I'm
> > > not sure the audience here is going to pay much attention to GKI based
> > > reports. Were you using any vendorhooks?
> >
> > We have verified this on a GKI kernel with all vendor hooks removed.
> > The issue still reproduces in this environment. This suggests the
> > problem is not directly caused by our vendor-specific modifications.
>
> Did you try on the latest android mainline kernel which is based on
> v6.19 ? This would help determine if the issue only happens on v6.12
> or on more recent kernels too

We also tested this case on android kernel 6.18. The issue is still
reproducible, although the probability of occurrence is significantly
lower compared to 6.12.


>
> I ran your rt-app json file on the latest tip/sched/core but I don't
> see any scheduling issue
>
> >
> > We conducted an experiment by disabling the DELAY_DEQUEUE feature.
> > After turning it off, we observed a significant increase in threads
> > with extremely long runnable times. Even kworkers started exhibiting
> > timeout phenomena.
>
> Just to make sure, the problem happens even if you don't disable DELAY_DEQUEUE ?

Yes, we see this problem with both DELAY_DEQUEUE on and off.

Additionally, we noticed that the tasks suffering from long scheduling
latencies frequently belong to different cgroups (e.g., foreground),
rather than the background cgroup where the rt-app load is running.
This unexpected cross-group interference is quite puzzling to us...

Thanks!
---
xuewen