Re: [Question] Sched: Severe scheduling latency (>10s) observed on kernel 6.12 with specific workload

From: Xuewen Yan

Date: Tue Apr 14 2026 - 01:20:26 EST

Thanks Dietmar!

On Fri, Apr 10, 2026 at 7:14 PM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:
>
> On 08.04.26 03:50, Xuewen Yan wrote:
> > Hi Dietmar and Vincent,
> >
> >> I tried to recreate your env as much as possible on qemu and ran your
> >> rt-app file but I can't spot anything suspicious either. This is with
> >> defconfig and cgroupv2.
> >
> > Could you please try the following configuration?
> > To rule out Android's influence, I created two new cgroups:
> > foreground_test and background_test.
> > I then placed only rt-app threads into these groups. Even with this
> > setup, we can still observe high scheduling latency for tasks in
> > foreground_test.
>
> Is this still on an Android (vendor hooks, etc.) or mainline 6.12.58
> kernel/device?

We tested in Android tree without any vendorhooks.
Just as John said, after we revert commit 6d71a9c61604 ("sched/fair:
Fix EEVDF entity
placement bug causing scheduling lag") in android16-6.12, the latency
disappeared, because the commit is not exist in stable tree.

On the other hand, we also tested the android17-6.18, without any
revert, the latency still exist.

>
> > {
> > "tasks" : {
> > "t0" : {
> > "instance" : 40,
> > "priority" : 0,
> > "cpus" : [ 0, 1, 2, 3 ],
> > "taskgroup" : "/background_test",
> > "loop" : -1,
> > "run" : 200,
> > "sleep" : 50
> > },
> > "t1" : {
> > "instance" : 2,
> > "priority" : 19,
> > "cpus" : [ 0, 1, 2, 3 ],
> > "taskgroup" : "/foreground_test",
> > "loop" : -1,
> > "run" : 60000,
> > "sleep" : 100000
> > },
> > "t2" : {
> > "instance" : 2,
> > "priority" : 10,
> > "cpus" : [ 0, 1, 2, 3 ],
> > "taskgroup" : "/foreground_test",
> > "loop" : -1,
> > "run" : 5000,
> > "sleep" : 100000
> > }
> > }
> > }
>
> With your rt-app file and moving the tasks into cgroupv2 taskgroups
> manually:
>
> t0-0 679 0 > /sys/fs/cgroup/A
> t0-1 680 0 > /sys/fs/cgroup/A
> t0-2 681 0 > /sys/fs/cgroup/A
> ...
> t0-37 716 0 > /sys/fs/cgroup/A
> t0-38 717 0 > /sys/fs/cgroup/A
> t0-39 718 0 > /sys/fs/cgroup/A
> t1-40 719 19 > /sys/fs/cgroup/B
> t1-41 720 19 > /sys/fs/cgroup/B
> t2-42 721 10 > /sys/fs/cgroup/B
> t2-43 722 10 > /sys/fs/cgroup/B
>
> 10 highest wu_lat values on Arm64 qemu (-accel hvf):
>
> v6.6
>
> 0.564159000 t2-43:684
> 0.306418000 t2-42:683
> 0.237134000 t2-43:684
> 0.166982000 t2-42:683
> 0.166674000 t2-42:683
> 0.161856000 t2-42:683
> 0.098879000 t2-43:684
> 0.097746000 t2-43:684
> 0.083329000 t2-42:683
> 0.082943000 t2-42:683
>
> 6.12.58
>
> 0.368566000 t2-43:939
> 0.228139000 t2-42:938
> 0.212454000 t2-43:939
> 0.207144000 t2-42:938
> 0.177373000 t2-43:939
> 0.148268000 t2-42:938
> 0.147619000 t2-42:938
> 0.125988000 t2-43:939
> 0.091564000 t2-42:938
> 0.088160000 t2-42:938
>
> tip sched/core (7.0.0-rc6-00050-g985215804dcb)
>
> 0.395585000 t2-43:697
> 0.203889000 t2-43:697
> 0.101130000 t2-42:696
> 0.098782000 t2-42:696
> 0.084523000 t2-43:697
> 0.033895000 t2-42:696
> 0.031881000 t2-43:697
> 0.021958000 t2-42:696
> 0.018123000 t2-42:696
> 0.013132000 t0-7:661
>
> Could you specify which tasks had those > 1s wu_lat values?
>