Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle

From: K Prateek Nayak
Date: Thu Mar 13 2025 - 23:28:39 EST

Next message: Zhongkun He: "[PATCH V2] mm: vmscan: skip the file folios in proactive reclaim if swappiness is MAX"
Previous message: Michael Kelley: "RE: [PATCH v5 10/10] Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs"
In reply to: K Prateek Nayak: "Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle"
Next in thread: Aaron Lu: "Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello Aaron,

On 3/13/2025 12:51 PM, Aaron Lu wrote:

[..snip..]

+static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags);
static void throttle_cfs_rq_work(struct callback_head *work)
{
+ struct task_struct *p = container_of(work, struct task_struct,
sched_throttle_work);
+ struct sched_entity *se;
+ struct cfs_rq *cfs_rq;
+ struct rq *rq;
+ struct rq_flags rf;
+
+ WARN_ON_ONCE(p != current);
+ p->sched_throttle_work.next = &p->sched_throttle_work;
+
+ /*
+ * If task is exiting, then there won't be a return to userspace, so we
+ * don't have to bother with any of this.
+ */
+ if ((p->flags & PF_EXITING))
+ return;
+
+ rq = task_rq_lock(p, &rf);

nit. With CLASS(task_rq_lock, rq_guard)(p), you can fetch the rq with
"rq_gurad.rq" and the "goto out_unlock" can be replaced with simple
return.

+
+ se = &p->se;
+ cfs_rq = cfs_rq_of(se);
+
+ /* Raced, forget */
+ if (p->sched_class != &fair_sched_class)
+ goto out_unlock;
+
+ /*
+ * If not in limbo, then either replenish has happened or this task got
+ * migrated out of the throttled cfs_rq, move along
+ */
+ if (!cfs_rq->throttle_count)
+ goto out_unlock;
+
+ update_rq_clock(rq);
+ WARN_ON_ONCE(!list_empty(&p->throttle_node));
+ list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list);
+ dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL);> + resched_curr(rq);
+
+out_unlock:
+ task_rq_unlock(rq, p, &rf);
}

void init_cfs_throttle_work(struct task_struct *p)
@@ -5873,32 +5914,81 @@ static int tg_unthrottle_up(struct task_group
*tg, void *data)
return 0;
}

+static inline bool task_has_throttle_work(struct task_struct *p)
+{
+ return p->sched_throttle_work.next != &p->sched_throttle_work;
+}
+
+static inline void task_throttle_setup_work(struct task_struct *p)
+{
+ /*
+ * Kthreads and exiting tasks don't return to userspace, so adding the
+ * work is pointless
+ */
+ if ((p->flags & (PF_EXITING | PF_KTHREAD)))
+ return;
+
+ if (task_has_throttle_work(p))
+ return;
+
+ task_work_add(p, &p->sched_throttle_work, TWA_RESUME);
+}
+
static int tg_throttle_down(struct task_group *tg, void *data)
{
struct rq *rq = data;
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
+ struct task_struct *p;
+ struct rb_node *node;
+
+ cfs_rq->throttle_count++;
+ if (cfs_rq->throttle_count > 1)
+ return 0;

General question: Do we need the throttled_lb_pair() check in
can_migrate_task() with the per-task throttle? Moving a throttled task
to another CPU can ensures that the task can run quicker and exit to
user space as quickly as possible and once the task dequeues, it will
remove itself from the list of fair tasks making it unreachable for
the load balancer. Thoughts?

/* group is entering throttled state, stop time */
- if (!cfs_rq->throttle_count) {
- cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
- list_del_leaf_cfs_rq(cfs_rq);
+ cfs_rq->throttled_clock_pelt = rq_clock_pelt(rq);
+ list_del_leaf_cfs_rq(cfs_rq);

- SCHED_WARN_ON(cfs_rq->throttled_clock_self);
- if (cfs_rq->nr_queued)
- cfs_rq->throttled_clock_self = rq_clock(rq);
+ SCHED_WARN_ON(cfs_rq->throttled_clock_self);
+ if (cfs_rq->nr_queued)
+ cfs_rq->throttled_clock_self = rq_clock(rq);
+
+ WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list));
+ /*
+ * rq_lock is held, current is (obviously) executing this in kernelspace.
+ *
+ * All other tasks enqueued on this rq have their saved PC at the
+ * context switch, so they will go through the kernel before returning
+ * to userspace. Thus, there are no tasks-in-userspace to handle, just
+ * install the task_work on all of them.
+ */
+ node = rb_first(&cfs_rq->tasks_timeline.rb_root);
+ while (node) {
+ struct sched_entity *se = __node_2_se(node);
+
+ if (!entity_is_task(se))
+ goto next;
+
+ p = task_of(se);
+ task_throttle_setup_work(p);
+next:
+ node = rb_next(node);
+ }
+
+ /* curr is not in the timeline tree */
+ if (cfs_rq->curr && entity_is_task(cfs_rq->curr)) {
+ p = task_of(cfs_rq->curr);
+ task_throttle_setup_work(p);
}
- cfs_rq->throttle_count++;

return 0;
}

[..snip..]

--
Thanks and Regards,
Prateek

Next message: Zhongkun He: "[PATCH V2] mm: vmscan: skip the file folios in proactive reclaim if swappiness is MAX"
Previous message: Michael Kelley: "RE: [PATCH v5 10/10] Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs"
In reply to: K Prateek Nayak: "Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle"
Next in thread: Aaron Lu: "Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]