Re: [Question] Sched: Severe scheduling latency (>10s) observed on kernel 6.12 with specific workload

From: Xuewen Yan

Date: Thu Apr 09 2026 - 23:31:46 EST


Hi John,

On Fri, Apr 10, 2026 at 5:39 AM John Stultz <jstultz@xxxxxxxxxx> wrote:
>
> On Tue, Mar 31, 2026 at 7:32 PM Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
> >
> > I am writing to report a severe scheduling latency issue we recently
> > discovered on Linux Kernel 6.12.
> >
> > Issue Description
> >
> > We observed that when running a specific background workload pattern,
> > certain tasks experience excessive scheduling latency. The delay from
> > the runnable state to running on the CPU exceeds 10 seconds, and in
> > extreme cases, it reaches up to 100 seconds.
> >
> > Environment Details
> >
> > Kernel Version: 6.12.58-android16-6-g3835fd28159d-ab000018-4k
> > Architecture: [ ARM64]
> > Hardware: T7300
> > Config: gki_defconfig
> >
> > RT-app‘s workload Pattern:
> >
> > {
> > "tasks" : {
> > "t0" : {
> > "instance" : 40,
> > "priority" : 0,
> > "cpus" : [ 0, 1, 2, 3 ],
> > "taskgroup" : "/background",
> > "loop" : -1,
> > "run" : 200,
> > "sleep" : 50
> > }
> > }
> > }
> >
>
> So, with this config I think I may have reproduced it on a device
> (using android16-6.12). I've not quite seen 10+ seconds, but I have
> seen >2second delays for kworker threads (though usually the max seems
> to be around 600ms).

Thanks for the detailed update! It’s great to hear that you’ve managed
to reproduce the issue on a real device. Even though the latency is
around 2 seconds (instead of 10+), that still significantly confirms
the problem exists. The difference in magnitude might just be due to
specific background load conditions.

>
> Unfortunately trying to reproduce using the same (andorid16-6.12)
> kernel branch with qemu initially hasn't been successful (and has been
> a bit of a yak shaving adventure: rt-app needs cgroupv1, which newer
> debian/systemd doesn't support anylonger, so installed a debian11
> image and had to build rt-app and its dependencies from source - then
> found perfetto binaries require a newer glibc so had to fetch and
> build perfetto from scratch as well). I can't see any similarly
> sized delays there.
>
> Out of curiosity, what are you using to detect the problem when you
> have rt-app running in the background? I've been tinkering with using
> cyclictest (-m -t -a --policy=SCHED_OTHER -b 1000000) to try to catch
> > 1sec latencies, but curious if you had something better?
>
We use the runqslower ebpf tool, the code in the kernel "tools/bpf/runqslower".

Thanks!
BR
---
xuewen