Re: [QUESTION] sched/fair: EEVDF min_slice stalls in parent cgroup with a continuously running child task
From: Vincent Guittot
Date: Wed Jun 17 2026 - 03:29:49 EST
On Wed, 17 Jun 2026 at 05:56, Chen Jinghuang <chenjinghuang2@xxxxxxxxxx> wrote:
>
> Hi all,
>
> I observed an unexpected behavior regarding the EEVDF min_slice update
> mechanism in a hierarchical cgroup v1 setup. It appears that the parent
> cgroup's min_slice can become stale when a child cgroup contains a
> continuously running task (100% load, no sleep). The parent's min_slice
> fails to update when global or entity slices change, until a dequeue/enqueue
> event occurs in the child cgroup.
>
> Here is the topology of the scenario:
>
> Root cfs_rq
> |
> cgroup A (se A, contains cpu_sim_A)
> |
> cgroup A1 (se A1, contains cpu_sim_A1, 100% CPU load)
>
> Steps to reproduce:
What is the value of /sys/kernel/debug/sched/base_slice_ns before
starting your test?
> 1. Create cgroup A, and a sub-cgroup A1 under A.
> 2. Move a task (cpu_sim_A) into cgroup A, and use the syscall
> __NR_sched_setattr to explicitly set its slice to 3ms.
> 3. Set the global base_slice_ns to 3ms:
> echo 3000000 > /sys/kernel/debug/sched/base_slice_ns
> 4. Move a 100% load task (cpu_sim_A1, which never voluntary sleeps) into
> cgroup A1.
> At this point, both cgroup A's min_slice and cgroup A1's min_slice are
> observed as 0.1ms (the initialized or previous low value).
This is a quite short slice value
> 5. Change the global base_slice_ns to 2.8ms:
> echo 2800000 > /sys/kernel/debug/sched/base_slice_ns
Changing the global sys/kernel/debug/sched/base_slice_ns at runtime is
not an expected behavior because this value is assumed to be a default
and constant value. This could be changed at the boot or something
like but not during a use case
>
> Observations:
> - cgroup A1's min_slice correctly updates to 2.8ms.
> - However, cgroup A's min_slice remains unchanged (stuck at 0.1ms).
> - Even if we use syscall(__NR_sched_setattr) to explicitly set cpu_sim_A's
> slice in cgroup A to 2.8ms, cgroup A's min_slice still does not update.
> - The only way to force cgroup A's min_slice to update to 2.8ms is to
> trigger a dequeue/enqueue cycle for the task in cgroup A1 (e.g., by
> renice it)
When an entity uses the default slice value, this system-wide default
value is not expected to change. We don't have a way to trigger an
update of all cgroups on all CPUs, and we don't want one.
>
> It appears that a parent cgroup's min_slice is only updated when its
> children are enqueued or dequeued. If a child task runs continuously
> without sleeping, the parent's min_slice gets stuck and ignores any
> changes to base_slice_ns or individual entity slices.
It ignores changes to base_slice_ns but catches changes to the
entity's custom slice.
>
> Is this expected by design in EEVDF, or is there a missing update hook?
>
> Any insights would be greatly appreciated.
>
> Thanks,
> Chen Jinghuang