Re: [RFC PATCH v3 10/10] sched/fair: Throttle CFS tasks on return to userspace

From: Valentin Schneider
Date: Tue Jul 16 2024 - 08:46:33 EST


On 12/07/24 19:43, Peter Zijlstra wrote:
> On Thu, Jul 11, 2024 at 03:00:04PM +0200, Valentin Schneider wrote:
>
>> +static void throttle_one_task(struct cfs_rq *cfs_rq, struct task_struct *p)
>> {
>> + long task_delta, idle_task_delta;
>> + struct sched_entity *se = &p->se;
>> +
>> + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
>>
>> + task_delta = 1;
>> + idle_task_delta = cfs_rq_is_idle(cfs_rq) ? 1 : 0;
>> +
>> + for_each_sched_entity(se) {
>> + cfs_rq = cfs_rq_of(se);
>> +
>> + if (!se->on_rq)
>> + return;
>> +
>> + dequeue_entity(cfs_rq, se, DEQUEUE_SLEEP);
>> + cfs_rq->h_nr_running -= task_delta;
>> + cfs_rq->idle_h_nr_running -= idle_task_delta;
>> +
>> + if (cfs_rq->load.weight) {
>> + /* Avoid re-evaluating load for this entity: */
>> + se = parent_entity(se);
>> + break;
>> + }
>> + }
>> +
>> + for_each_sched_entity(se) {
>> + cfs_rq = cfs_rq_of(se);
>> + /* throttled entity or throttle-on-deactivate */
>> + if (!se->on_rq)
>> + goto throttle_done;
>> +
>> + update_load_avg(cfs_rq, se, 0);
>> + se_update_runnable(se);
>> + cfs_rq->h_nr_running -= task_delta;
>> + cfs_rq->h_nr_running -= idle_task_delta;
>> + }
>> +
>> +throttle_done:
>> + /* At this point se is NULL and we are at root level*/
>> + sub_nr_running(rq_of(cfs_rq), 1);
>> }
>
> I know you're just moving code around, but we should look if we can
> share code between this and dequeue_task_fair().
>
> I have patches around this in that eevdf series I should send out again,
> I'll try and have a stab.
>

Looking at this again, couldn't this actually just be:

list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
dequeue_task_fair(rq, p, DEQUEUE_SLEEP);

?

Because the main deltas between throttle_one_task() and dequeue_task_fair()
are (other than the list_add()):
o Caring about throttled entities (!se->on_rq) - which AFAICT can't happen
after this patch, since non-empty cfs_rq's are kept enqueued
o Load tracking faff (util_est, an extra update_cfs_group())
o The set_next_buddy() thing, which will always be a nop because of
throttled_hierarchy()
o A rq->next_balance update
o hrtick_update()

I think some of these are omitted from throttle_cfs_rq() because the whole
cfs_rq is being taken out, but here we are genuinely taking just one task
out, so I think this does want to be pretty much dequeue_task_fair().