I chose to do it this way because:static int tg_throttle_down(struct task_group *tg, void *data)
{
struct rq *rq = data;
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
+ struct task_struct *p;
+ struct rb_node *node;
+
+ cfs_rq->throttle_count++;
+ if (cfs_rq->throttle_count > 1)
+ return 0;
/* group is entering throttled state, stop time */
- if (!cfs_rq->throttle_count) {
- cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
- list_del_leaf_cfs_rq(cfs_rq);
+ cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
Once cencern here is that the PELT is seemingly frozen despite the
hierarchy being runnable. I've still not tracked down whether it'll
cause any problems once unthrottled and all throttled time is negated
from the pelt clock but is there any concerns here?
1 I expect most of the time, if a task has to continue to run after its
cfs_rq gets throttled, the time is relatively small so should not cause
much impact. But I agree there can be times a task runs relatively long;
2 I think the original intent to freeze cfs_rq's pelt clock on throttle
is so that on unthrottle, it can retore its loada(without its load being
decayed etc.). If I chose to not freeze its pelt clock on throttle
because some task is still running in kernel mode, since some of this
cfs_rq's tasks are throttled, its load can become smaller and this can
impact its load on unthrottle.
I think both approach is not perfect, so I chose the simple one for now
:) Not sure if my thinking is correct though.
Maybe this can be done at dequeue when cfs_rq->nr_queued on a
throttled_hierarchy() reached 0.
Yes, this looks more consistent, maybe I should change to this approach.
+ list_del_leaf_cfs_rq(cfs_rq);
- SCHED_WARN_ON(cfs_rq->throttled_clock_self);
- if (cfs_rq->nr_queued)
- cfs_rq->throttled_clock_self = rq_clock(rq);
+ SCHED_WARN_ON(cfs_rq->throttled_clock_self);
+ if (cfs_rq->nr_queued)
+ cfs_rq->throttled_clock_self = rq_clock(rq);
+
+ WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list));
+ /*
+ * rq_lock is held, current is (obviously) executing this in kernelspace.
+ *
+ * All other tasks enqueued on this rq have their saved PC at the
+ * context switch, so they will go through the kernel before returning
+ * to userspace. Thus, there are no tasks-in-userspace to handle, just
+ * install the task_work on all of them.
+ */
+ node = rb_first(&cfs_rq->tasks_timeline.rb_root);
+ while (node) {
+ struct sched_entity *se = __node_2_se(node);
+
+ if (!entity_is_task(se))
+ goto next;
+
+ p = task_of(se);
+ task_throttle_setup_work(p);
+next:
+ node = rb_next(node);
+ }
+
+ /* curr is not in the timeline tree */
+ if (cfs_rq->curr && entity_is_task(cfs_rq->curr)) {
I believe we can reach here from pick_next_task_fair() ->
check_cfs_rq_runtime() -> throttle_cfs_rq() in which case cfs_rq->curr
will still be set despite the task being blocked since put_prev_entity()
has not been called yet.
I believe there should be a check for task_on_rq_queued() here for the
current task.
Ah right, I'll see how to fix this.
Thanks,
Aaron
+ p = task_of(cfs_rq->curr);
+ task_throttle_setup_work(p);
}
- cfs_rq->throttle_count++;
return 0;
}
[..snip..]
--
Thanks and Regards,
Prateek