Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg()

From: Peter Zijlstra
Date: Thu Mar 30 2017 - 09:47:12 EST


On Thu, Mar 30, 2017 at 02:16:58PM +0200, Peter Zijlstra wrote:
> On Thu, Mar 30, 2017 at 04:21:08AM -0700, Paul Turner wrote:

> > - The naming here is really ambiguous:
> > "__accumulate_sum" -> "__accumulate_pelt_segments"?
>
> OK, I did struggle with that a bit too but failed to improve, I'll change it.
>
> > - Passing in "remainder" seems irrelevant to the sum accumulation. It would be
> > more clear to handle it from the caller.
>
> Well, this way we have all 3 delta parts in one function. I'll try it
> and see what it looks like though.

> > This is super confusing. It only works because remainder already had
> > period_contrib aggregated _into_ it. We're literally computing:
> > remainder + period_contrib - period_contrib
>
> Correct; although I didn't find it too confusing. Could be because I'd
> been staring at this code for a few hours though.
>
> > We should just not call this in the !periods case and handle the remainder
> > below.
>
> I'll change it see what it looks like.

How's this?

---
kernel/sched/fair.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 76f67b3e34d6..10d34498b5fe 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2795,12 +2795,9 @@ static u64 decay_load(u64 val, u64 n)
return val;
}

-static u32 __accumulate_sum(u64 periods, u32 period_contrib, u32 remainder)
+static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3)
{
- u32 c1, c2, c3 = remainder; /* y^0 == 1 */
-
- if (!periods)
- return remainder - period_contrib;
+ u32 c1, c2, c3 = d3; /* y^0 == 1 */

if (unlikely(periods >= LOAD_AVG_MAX_N))
return LOAD_AVG_MAX;
@@ -2861,8 +2858,8 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
unsigned long weight, int running, struct cfs_rq *cfs_rq)
{
unsigned long scale_freq, scale_cpu;
+ u32 contrib = delta;
u64 periods;
- u32 contrib;

scale_freq = arch_scale_freq_capacity(NULL, cpu);
scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
@@ -2880,13 +2877,14 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
decay_load(cfs_rq->runnable_load_sum, periods);
}
sa->util_sum = decay_load((u64)(sa->util_sum), periods);
- }

- /*
- * Step 2
- */
- delta %= 1024;
- contrib = __accumulate_sum(periods, sa->period_contrib, delta);
+ /*
+ * Step 2
+ */
+ delta %= 1024;
+ contrib = __accumulate_pelt_segments(periods,
+ 1024 - sa->period_contrib, delta);
+ }
sa->period_contrib = delta;

contrib = cap_scale(contrib, scale_freq);