Re: [RFC PATCH 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY feature

From: Mathieu Desnoyers
Date: Thu Oct 19 2023 - 10:49:12 EST


On 2023-10-19 09:28, Mathieu Desnoyers wrote:
On 2023-10-19 07:35, Chen Yu wrote:
[...]
+/*
+ * Returns true if adding the task utilization to the estimated
+ * utilization of the runnable tasks on @cpu does not exceed the
+ * capacity of @cpu.
+ *
+ * This considers only the utilization of _runnable_ tasks on the @cpu
+ * runqueue, excluding blocked and sleeping tasks. This is achieved by
+ * using the runqueue util_est.enqueued, and by estimating the capacity
+ * of @cpu based on arch_scale_cpu_capacity and arch_scale_thermal_pressure
+ * rather than capacity_of() because capacity_of() considers
+ * blocked/sleeping tasks in other scheduler classes.
+ *
+ * The utilization vs capacity comparison is done without the margin
+ * provided by fits_capacity(), because fits_capacity() is used to
+ * validate whether the utilization of a task fits within the overall
+ * capacity of a cpu, whereas this function validates whether the task
+ * utilization fits within the _remaining_ capacity of the cpu, which is
+ * more precise.
+ */
+static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util,
+                            int cpu)
+{
+    unsigned long total_util, capacity;
+
+    if (!sched_util_fits_capacity_active())
+        return false;
+    total_util = READ_ONCE(cpu_rq(cpu)->cfs.avg.util_est.enqueued) + task_util;
+    capacity = arch_scale_cpu_capacity(cpu) - arch_scale_thermal_pressure(cpu);

scale_rt_capacity(cpu) could provide the remaining cpu capacity after substracted by
the side activity(rt tasks/thermal pressure/irq time), maybe it would be more accurate?

AFAIU, scale_rt_capacity(cpu) works similarly to capacity_of(cpu) and considers blocked and sleeping tasks in the rq->avg_rt.util_avg and rq->avg_dl.util_avg. I'm not sure sure about rq->avg_irq.util_avg and thermal_load_avg().

This goes against what is needed here: we need a utilization that only considers enqueued runnable tasks (exluding blocked and sleeping tasks). Or am I missing something ?


I was wrong. Looking more closely at dl and rt sched classes, unlike the fair sched class, they don't appear to take into account sleeping/blocked tasks in their util_avg. They just accumulate the rq util_sum and derive a rq util_avg from it. Likewise for thermal and irq.

So both capacity_of(cpu) and scale_rt_capacity(cpu) would appear to do what we need here, but AFAIU capacity_of(cpu) is based on a metric which is only updated once per jiffy or so.

Let me try using scale_rt_capacity(cpu) then.

Thanks!

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com