Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT

From: Pavan Kondeti
Date: Tue Oct 23 2018 - 01:59:59 EST

Next message: zhong jiang: "[PATCH] rtlwifi: remove set but not used variable 'radiob_array_table' and 'radiob_arraylen'"
Previous message: Michal Hocko: "Re: [PATCH] mm,oom: Use timeout based back off."
In reply to: Vincent Guittot: "[PATCH v4 2/2] sched/fair: update scale invariance of PELT"
Next in thread: Vincent Guittot: "Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Vincent,

On Fri, Oct 19, 2018 at 06:17:51PM +0200, Vincent Guittot wrote:
>
> /*
> + * The clock_pelt scales the time to reflect the effective amount of
> + * computation done during the running delta time but then sync back to
> + * clock_task when rq is idle.
> + *
> + *
> + * absolute time | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|16
> + * @ max capacity ------******---------------******---------------
> + * @ half capacity ------************---------************---------
> + * clock pelt | 1| 2| 3| 4| 7| 8| 9| 10| 11|14|15|16
> + *
> + */
> +void update_rq_clock_pelt(struct rq *rq, s64 delta)
> +{
> +
> + if (is_idle_task(rq->curr)) {
> + u32 divider = (LOAD_AVG_MAX - 1024 + rq->cfs.avg.period_contrib) << SCHED_CAPACITY_SHIFT;
> + u32 overload = rq->cfs.avg.util_sum + LOAD_AVG_MAX;
> + overload += rq->avg_rt.util_sum;
> + overload += rq->avg_dl.util_sum;
> +
> + /*
> + * Reflecting some stolen time makes sense only if the idle
> + * phase would be present at max capacity. As soon as the
> + * utilization of a rq has reached the maximum value, it is
> + * considered as an always runnnig rq without idle time to
> + * steal. This potential idle time is considered as lost in
> + * this case. We keep track of this lost idle time compare to
> + * rq's clock_task.
> + */
> + if (overload >= divider)
> + rq->lost_idle_time += rq_clock_task(rq) - rq->clock_pelt;
> +

I am trying to understand this better. I believe we run into this scenario, when
the frequency is limited due to thermal/userspace constraints. Lets say
frequency is limited to Fmax/2. A 50% task at Fmax, becomes 100% running at
Fmax/2. The utilization is built up to 100% after several periods.
The clock_pelt runs at 1/2 speed of the clock_task. We are loosing the idle time
all along. What happens when the CPU enters idle for a short duration and comes
back to run this 100% utilization task?

If the above block is not present i.e lost_idle_time is not tracked, we
stretch the idle time (since clock_pelt is synced to clock_task) and the
utilization is dropped. Right?

With the above block, we don't stretch the idle time. In fact we don't
consider the idle time at all. Because,

idle_time = now - last_time;

idle_time = (rq->clock_pelt - rq->lost_idle_time) - last_time
idle_time = (rq->clock_task - rq_clock_task + rq->clock_pelt_old) - last_time
idle_time = rq->clock_pelt_old - last_time

The last time is nothing but the last snapshot of the rq->clock_pelt when the
task entered sleep due to which CPU entered idle.

Can you please explain the significance of the above block with an example?

> +
> + /* The rq is idle, we can sync to clock_task */
> + rq->clock_pelt = rq_clock_task(rq);
> +
> +
> + } else {
> + /*
> + * When a rq runs at a lower compute capacity, it will need
> + * more time to do the same amount of work than at max
> + * capacity: either because it takes more time to compute the
> + * same amount of work or because taking more time means
> + * sharing more often the CPU between entities.
> + * In order to be invariant, we scale the delta to reflect how
> + * much work has been really done.
> + * Running at lower capacity also means running longer to do
> + * the same amount of work and this results in stealing some
> + * idle time that will disturb the load signal compared to
> + * max capacity; This stolen idle time will be automaticcally
> + * reflected when the rq will be idle and the clock will be
> + * synced with rq_clock_task.
> + */
> +
> + /*
> + * scale the elapsed time to reflect the real amount of
> + * computation
> + */
> + delta = cap_scale(delta, arch_scale_freq_capacity(cpu_of(rq)));
> + delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu_of(rq)));
> +
> + rq->clock_pelt += delta;

AFAICT, the rq->clock_pelt is used for both utilization and load. So the load
also becomes a function of CPU uarch now. Is this intentional?

Thanks,
Pavan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

Next message: zhong jiang: "[PATCH] rtlwifi: remove set but not used variable 'radiob_array_table' and 'radiob_arraylen'"
Previous message: Michal Hocko: "Re: [PATCH] mm,oom: Use timeout based back off."
In reply to: Vincent Guittot: "[PATCH v4 2/2] sched/fair: update scale invariance of PELT"
Next in thread: Vincent Guittot: "Re: [PATCH v4 2/2] sched/fair: update scale invariance of PELT"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]