Let analyze it case by case:P
say cgroup A has 3 tasks: task A, task B, task C
1) assign taskA's slice to 0.1 ms, task B, tack C, task C all have the default slice (0.75ms)
2) task A is picked by __schedule as next task, because task A is still on rq,
so the cfs_rq hierarchical doesn't have to change cfs_rq's min_slice, it will report it to the root cgroup
3) task A is preempted by other task, it's still runnable. it will be requeued cgroup A's cfs_rq. similar as case 2
4) task A is preempted since it's blocked, task A's se will be retained in cgroup A's cfs_rq until it reach 0-lag state.
4.1 before 0-lag, I guess it's similar as case 2
the logic is based on cfs_rq's avg_runtime, it supposed task A won't be pick as next task before it reach 0-lag state.
If my understanding is wrong, pls correct me. Thanks.
4.2 After it reached 0-lag state, If it's picked by pick_task_fair, it will be removed from cgroup A cfs_rq ultimately.
pick_next_entity->dequeue_entities(DEQUEUE_SLEEP | DEQUEUE_DELAYED)->__dequeue_entity (taskA)
so, cgroup A's cfs_rq min_slice will be re-calculated. So the cfs_rq hierarchical will modify their own min_slice bottom up.
4.3 After it reached 0-lag state, it will waked up. Because, the current __schedule() split the path for block/sleep from migration path. only migration path will call deactivate. so p->on_rq is still 1, ttwu_runnable will work for it to just call requeue_delayed_entity. similar as case 2
I think only case 1 has such problem.
Regards,
Yongmei.