Re: sysbench throughput degradation in 4.13+
From: Peter Zijlstra
Date: Thu Sep 28 2017 - 08:38:08 EST
On Wed, Sep 27, 2017 at 01:58:20PM -0400, Rik van Riel wrote:
> @@ -5359,10 +5378,14 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
> unsigned long current_load = task_h_load(current);
>
> /* in this case load hits 0 and this LLC is considered 'idle' */
> - if (current_load > this_stats.load)
> + if (current_load > this_stats.max_load)
> + return true;
> +
> + /* allow if the CPU would go idle, regardless of LLC load */
> + if (current_load >= target_load(this_cpu, sd->wake_idx))
> return true;
>
> - this_stats.load -= current_load;
> + this_stats.max_load -= current_load;
> }
>
> /*
> @@ -5375,10 +5398,6 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
> if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1)
> return false;
>
> - /* if this cache has capacity, come here */
> - if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running)
> - return true;
> -
> /*
> * Check to see if we can move the load without causing too much
> * imbalance.
> @@ -5391,8 +5410,8 @@ wake_affine_llc(struct sched_domain *sd, struct task_struct *p,
> prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
> prev_eff_load *= this_stats.capacity;
>
> - this_eff_load *= this_stats.load + task_load;
> - prev_eff_load *= prev_stats.load - task_load;
> + this_eff_load *= this_stats.max_load + task_load;
> + prev_eff_load *= prev_stats.min_load - task_load;
>
> return this_eff_load <= prev_eff_load;
> }
So I would really like a workload that needs this LLC/NUMA stuff.
Because I much prefer the simpler: 'on which of these two CPUs can I run
soonest' approach.