Re: Abysmal scheduler performance in Linus' tree?
From: Peter Zijlstra
Date: Wed Sep 06 2017 - 06:44:31 EST
On Wed, Sep 06, 2017 at 11:18:46AM +0100, Chris Wilson wrote:
> > +static void get_llc_stats(struct llc_stats *stats, int cpu)
> > +{
> > + struct sched_domain_shared *sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> > +
> > + if (!sds) {
> > + memset(&stats, 0, sizeof(*stats));
>
> Yes, I even sent you a mail about it ;)
Bah, too much email, sorry :-(
> > + /*
> > + * The has_capacity stuff is not SMT aware, but by trying to balance
> > + * the nr_running on both ends we try and fill the domain at equal
> > + * rates, thereby first consuming cores before siblings.
> > + */
> > +
> > + /* if the old cache has capacity, stay there */
> > + if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1)
> > + return false;
> > +
> > + /* if this cache has capacity, come here */
> > + if (this_stats.has_capacity && this_stats.nr_running < prev_stats.nr_running+1)
> > + return true;
>
> This is still not working as intended, it should be
>
> if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running)
> return true;
>
> to fix the regression.
Argh, you're quite right. Let me do a patch for that.