Re: [PATCH v2] sched/eevdf: Force propagating min_slice of cfs_rq when a task changing slice

From: Tianchen Ding
Date: Thu Nov 14 2024 - 01:37:16 EST


On 2024/11/14 14:06, 解 咏梅 wrote:
Let analyze it case by case:P

say cgroup A has 3 tasks: task A, task B, task C

1) assign taskA's slice to 0.1 ms, task B, tack C, task C all have the default slice (0.75ms)

2) task A is picked by __schedule as next task, because task A is still on rq,
so the cfs_rq hierarchical doesn't have to change cfs_rq's min_slice, it will report it to the root cgroup

3) task A is preempted by other task, it's still runnable. it will be requeued cgroup A's cfs_rq. similar as case 2

4) task A is preempted since it's blocked, task A's se will be retained in cgroup A's cfs_rq until it reach 0-lag state.
4.1 before 0-lag, I guess it's similar as case 2
the logic is based on cfs_rq's avg_runtime, it supposed task A won't be pick as next task before it reach 0-lag state.
If my understanding is wrong, pls correct me. Thanks.
4.2 After it reached 0-lag state, If it's picked by pick_task_fair, it will be removed from cgroup A cfs_rq ultimately.
pick_next_entity->dequeue_entities(DEQUEUE_SLEEP | DEQUEUE_DELAYED)->__dequeue_entity (taskA)
so, cgroup A's cfs_rq min_slice will be re-calculated. So the cfs_rq hierarchical will modify their own min_slice bottom up.
4.3 After it reached 0-lag state, it will waked up. Because, the current __schedule() split the path for block/sleep from migration path. only migration path will call deactivate. so p->on_rq is still 1, ttwu_runnable will work for it to just call requeue_delayed_entity. similar as case 2

I think only case 1 has such problem.

Regards,
Yongmei.


I think you misunderstood the case. We're not talking about the DELAY_DEQUEUE feature. We're simply talking about enqueue(waking up) and dequeue(sleeping).
For convenience, let's turn DELAY_DEQUEUE off.

Consider the following cgroup hierarchy on one cpu:


root_cgroup
|
------------------------
| |
cgroup_A(curr) other_cgroups...
|
--------------
| |
any_se(curr) cgroup_B(runnable)
|
------------
| |
task_A(sleep) task_B(runnable)

Assume task_A has a smaller slice(0.1ms) and all other tasks have default slice(0.75ms).

Because task_A is sleeping, it is not actually on the tree.

Now task_A is woken up. It is enqueued to cgroup_B. So slice of cgroup_B is updated to 0.1ms. This is ok.

However, Since cgroup_B is already on_rq, it cannot be "enqueued" again to cgroup_A. The code is running to the bottom half.(the second for_each_sched_entity loop in enqueue_task_fair)

So the slice of cgroup_A is not updated. It is still 0.75ms.

Thanks.