Re: [Question] Sched: Severe scheduling latency (>10s) observed on kernel 6.12 with specific workload

From: Vincent Guittot

Date: Wed Apr 01 2026 - 06:16:19 EST


On Wed, 1 Apr 2026 at 08:04, Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
>
> On Wed, Apr 1, 2026 at 12:25 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> >
> > On Tue, Mar 31, 2026 at 7:32 PM Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
> > >
> > > Dear Linux maintainers and reviewers,
> > >
> > > I am writing to report a severe scheduling latency issue we recently
> > > discovered on Linux Kernel 6.12.
> > >
> > > Issue Description
> > >
> > > We observed that when running a specific background workload pattern,
> > > certain tasks experience excessive scheduling latency. The delay from
> > > the runnable state to running on the CPU exceeds 10 seconds, and in
> > > extreme cases, it reaches up to 100 seconds.
> > >
> > > Environment Details
> > >
> > > Kernel Version: 6.12.58-android16-6-g3835fd28159d-ab000018-4k
> > > Architecture: [ ARM64]
> > > Hardware: T7300
> > > Config: gki_defconfig
> > >
> > > RT-app‘s workload Pattern:
> > >
> > > {
> > > "tasks" : {
> > > "t0" : {
> > > "instance" : 40,
> > > "priority" : 0,
> > > "cpus" : [ 0, 1, 2, 3 ],
> > > "taskgroup" : "/background",
> > > "loop" : -1,
> > > "run" : 200,
> > > "sleep" : 50
> > > }
> > > }
> > > }
> > >
> > > And we have applied the following patchs:
> > >
> > > https://lore.kernel.org/all/20251216111321.966709786@xxxxxxxxxxxxxxxxxxx/
> > > https://lore.kernel.org/all/20260106170509.413636243@xxxxxxxxxxxxxxxxxxx/
> > > https://lore.kernel.org/all/20260323134533.805879358@xxxxxxxxxxxxxxxxxxx/
> > >
> > >
> > > Could you please advise if there are known changes in the eevdf in
> > > 6.12 that might affect this specific workload pattern?
> > >
> >
> Thanks for the quick response!
>
> > Could you maybe instead point to some source for the runqslower binary
> > you attached? I don't think folks will run random binaries.
>
> We use the code in kernel "tools/bpf/runqslower".
>
> >
> > Also, it looks like the RT-app description uses the background cgroup,
> > can you share the cgroup configuration you have set for that?
>
> Our "background" cgroup does not have any special configurations applied.
>
> cpu.shares: Set to 1024, which is consistent with other cgroups on the system.
> Bandwidth Control: It is disabled (no cpu.cfs_quota_us limits set).
>
> >
> > Also, did you try to reproduce this against vanilla 6.12-stable ? I'm
> > not sure the audience here is going to pay much attention to GKI based
> > reports. Were you using any vendorhooks?
>
> We have verified this on a GKI kernel with all vendor hooks removed.
> The issue still reproduces in this environment. This suggests the
> problem is not directly caused by our vendor-specific modifications.

Did you try on the latest android mainline kernel which is based on
v6.19 ? This would help determine if the issue only happens on v6.12
or on more recent kernels too

I ran your rt-app json file on the latest tip/sched/core but I don't
see any scheduling issue

>
> We conducted an experiment by disabling the DELAY_DEQUEUE feature.
> After turning it off, we observed a significant increase in threads
> with extremely long runnable times. Even kworkers started exhibiting
> timeout phenomena.

Just to make sure, the problem happens even if you don't disable DELAY_DEQUEUE ?

>
> Thanks!
>
> ---
> xuewen