Re: [Question] Sched: Severe scheduling latency (>10s) observed on kernel 6.12 with specific workload
From: Dietmar Eggemann
Date: Thu Apr 02 2026 - 11:06:58 EST
On 02.04.26 07:16, Xuewen Yan wrote:
> On Wed, Apr 1, 2026 at 9:00 PM Dietmar Eggemann
> <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 01.04.26 12:48, Xuewen Yan wrote:
>>> On Wed, Apr 1, 2026 at 6:05 PM Vincent Guittot
>>> <vincent.guittot@xxxxxxxxxx> wrote:
>>>>
>>>> On Wed, 1 Apr 2026 at 08:04, Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
>>>>>
>>>>> On Wed, Apr 1, 2026 at 12:25 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On Tue, Mar 31, 2026 at 7:32 PM Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Dear Linux maintainers and reviewers,
>>>>>>>
>>>>>>> I am writing to report a severe scheduling latency issue we recently
>>>>>>> discovered on Linux Kernel 6.12.
>>>>>>>
>>>>>>> Issue Description
>>>>>>>
>>>>>>> We observed that when running a specific background workload pattern,
>>>>>>> certain tasks experience excessive scheduling latency. The delay from
>>>>>>> the runnable state to running on the CPU exceeds 10 seconds, and in
>>>>>>> extreme cases, it reaches up to 100 seconds.
>>>>>>>
>>>>>>> Environment Details
>>>>>>>
>>>>>>> Kernel Version: 6.12.58-android16-6-g3835fd28159d-ab000018-4k
>>>>>>> Architecture: [ ARM64]
>>>>>>> Hardware: T7300
>>
>> Is this 4 big & 4 little CPUs?
>
> 6 little + 2big.
> On our devices, background tasks are bound to cores 0-3. To mimic the
> behavior of these background tasks, we also bound rt-app to cores 0-3.
>
>>
>>>>>>> Config: gki_defconfig
>>>>>>>
>>>>>>> RT-app‘s workload Pattern:
>>>>>>>
>>>>>>> {
>>>>>>> "tasks" : {
>>>>>>> "t0" : {
>>>>>>> "instance" : 40,
>>>>>>> "priority" : 0,
>>>>>>> "cpus" : [ 0, 1, 2, 3 ],
>>>>>>> "taskgroup" : "/background",
>>>>>>> "loop" : -1,
>>>>>>> "run" : 200,
>>>>>>> "sleep" : 50
>>>>>>> }
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> And we have applied the following patchs:
>>>>>>>
>>>>>>> https://lore.kernel.org/all/20251216111321.966709786@xxxxxxxxxxxxxxxxxxx/
>>>>>>> https://lore.kernel.org/all/20260106170509.413636243@xxxxxxxxxxxxxxxxxxx/
>>>>>>> https://lore.kernel.org/all/20260323134533.805879358@xxxxxxxxxxxxxxxxxxx/
>>
>> Does the issue happen on v6.12.58 plain (android) or only when those 3
>> additional patches are applied on top?
>
> The issue was discovered on android16-6.12.58. We applied the
> following three patches, but the issue is still reproducible.
>
>>
>> d5843e1530d8 - sched/fair: Forfeit vruntime on yield (2025-12-18 Fernand
>> Sieber) v6.12.63
>>
>> bddd95054e33 - sched/eevdf: Fix min_vruntime vs avg_vruntime (2026-01-08
>> Peter Zijlstra) v6.12.64
>>
>> d2fc2dcfce47 - sched/fair: Fix zero_vruntime tracking (2026-03-25 Peter
>> Zijlstra) v6.12.78
>
> Thanks!
I tried to recreate your env as much as possible on qemu and ran your
rt-app file but I can't spot anything suspicious either. This is with
defconfig and cgroupv2.
$ cat /sys/devices/system/cpu/cpu*/cpu_capacity
512
512
512
512
512
512
1024
1024
10 highest wu_lat values:
v6.6
0.024601000 task0-9:881
0.019151000 task0-13:885
0.018344000 task0-27:899
0.017332000 task0-5:876
0.010613000 task0-21:893
0.010356000 task0-20:892
0.007796000 task0-15:887
0.007550000 task0-13:885
0.007292000 task0-2:872
0.006718000 task0-15:887
6.12.58
0.029507000 task0-32:1211
0.027374000 task0-37:1216
0.027294000 task0-12:1191
0.027063000 task0-11:1190
0.026612000 task0-28:1207
0.024829000 task0-38:1217
0.024472000 task0-18:1197
0.024396000 task0-34:1213
0.024303000 task0-10:1189
0.023317000 task0-26:1205
tip sched/core (7.0.0-rc4-00030-g265439eb88fd)
0.025000000 task0-32:851
0.020467000 task0-5:824
0.017190000 task0-16:835
0.015365000 task0-8:827
0.011591000 task0-32:851
0.010153000 task0-34:853
0.009932000 task0-4:823
0.008972000 task0-24:843
0.008564000 task0-39:858
0.007591000 task0-25:844