Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

From: Mike Galbraith
Date: Fri Jul 03 2015 - 05:29:24 EST


On Fri, 2015-07-03 at 08:40 +0200, Mike Galbraith wrote:

> Hm. Seems what this load should like best is if we detect 1:N, skip all
> of the routine gyrations, ie move the N (workers) infrequently, expend
> search cycles frequently only on the 1 (dispatch).
>
> Ponder..

While taking a refresher peek at the wake_wide() thing, seems it's not
really paying attention when the waker of many is awakened. I wonder if
your load would see more benefit if it watched like so.. rashly assuming
I didn't wreck it completely (iow, completely untested).

---
kernel/sched/fair.c | 36 ++++++++++++++++++++++--------------
1 file changed, 22 insertions(+), 14 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4586,10 +4586,23 @@ static void record_wakee(struct task_str
current->wakee_flips >>= 1;
current->wakee_flip_decay_ts = jiffies;
}
+ if (time_after(jiffies, p->wakee_flip_decay_ts + HZ)) {
+ p->wakee_flips >>= 1;
+ p->wakee_flip_decay_ts = jiffies;
+ }

if (current->last_wakee != p) {
current->last_wakee = p;
current->wakee_flips++;
+ /*
+ * Flip the buddy as well. It's the ratio of flips
+ * with a socket size decayed cutoff that determines
+ * whether the pair are considered to be part of 1:N
+ * or M*N loads of a size that we need to spread, so
+ * ensure flips of both load components. The waker
+ * of many will have many more flips than its wakees.
+ */
+ p->wakee_flips++;
}
}

@@ -4732,24 +4745,19 @@ static long effective_load(struct task_g

static int wake_wide(struct task_struct *p)
{
+ unsigned long max = max(current->wakee_flips, p->wakee_flips);
+ unsigned long min = min(current->wakee_flips, p->wakee_flips);
int factor = this_cpu_read(sd_llc_size);

/*
- * Yeah, it's the switching-frequency, could means many wakee or
- * rapidly switch, use factor here will just help to automatically
- * adjust the loose-degree, so bigger node will lead to more pull.
+ * Yeah, it's a switching-frequency heuristic, and could mean the
+ * intended many wakees/waker relationship, or rapidly switching
+ * between a few. Use factor to try to automatically adjust such
+ * that the load spreads when it grows beyond what will fit in llc.
*/
- if (p->wakee_flips > factor) {
- /*
- * wakee is somewhat hot, it needs certain amount of cpu
- * resource, so if waker is far more hot, prefer to leave
- * it alone.
- */
- if (current->wakee_flips > (factor * p->wakee_flips))
- return 1;
- }
-
- return 0;
+ if (min < factor)
+ return 0;
+ return max > min * factor;
}

static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/