Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus

From: Patrick Bellasi
Date: Thu May 24 2018 - 04:11:55 EST


On 23-May 16:18, Waiman Long wrote:
> On 05/23/2018 01:34 PM, Patrick Bellasi wrote:
> > Hi Waiman,
> >
> > On 17-May 16:55, Waiman Long wrote:
> >
> > [...]
> >
> >> @@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains,
> >> int ndoms = 0; /* number of sched domains in result */
> >> int nslot; /* next empty doms[] struct cpumask slot */
> >> struct cgroup_subsys_state *pos_css;
> >> + bool root_load_balance = is_sched_load_balance(&top_cpuset);
> >>
> >> doms = NULL;
> >> dattr = NULL;
> >> csa = NULL;
> >>
> >> /* Special case for the 99% of systems with one, full, sched domain */
> >> - if (is_sched_load_balance(&top_cpuset)) {
> >> + if (root_load_balance && !top_cpuset.isolation_count) {
> > Perhaps I'm missing something but, it seems to me that, when the two
> > conditions above are true, then we are going to destroy and rebuild
> > the exact same scheduling domains.
> >
> > IOW, on 99% of systems where:
> >
> > is_sched_load_balance(&top_cpuset)
> > top_cpuset.isolation_count = 0
> >
> > since boot time and forever, then every time we update a value for
> > cpuset.cpus we keep rebuilding the same SDs.
> >
> > It's not strictly related to this patch, the same already happens in
> > mainline based just on the first condition, but since you are extending
> > that optimization, perhaps you can tell me where I'm possibly wrong or
> > which cases I'm not considering.
> >
> > I'm interested mainly because on Android systems those conditions
> > are always true and we see SDs rebuilds every time we write
> > something in cpuset.cpus, which ultimately accounts for almost all the
> > 6-7[ms] time required for the write to return, depending on the CPU
> > frequency.
> >
> > Cheers Patrick
> >
> Yes, that is true. I will look into how to further optimize this. Thanks
> for the suggestion.

FWIW, following is my take on top of your series.

With the following patch applied I see a reduction of the average
execution time for a rebuild_sched_domains_locked() from 1.4[ms] to
40[us] while running 60 /tg1/cpuset.cpus switches in a loop on an
JunoR2 Arm board using the performance cpufreq governor.

---8<---