On Wed, Jun 27, 2018 at 05:41:22PM +0200, Dietmar Eggemann wrote:
On 06/22/2018 04:36 PM, Morten Rasmussen wrote:
On Fri, Jun 22, 2018 at 09:22:22AM +0100, Quentin Perret wrote:
What would happen if you hotplugged an entire cluster ? You'd loose the
SD_ASYM_CPUCAPACITY flag but keep the static key is that right ? Should
we care ?
I don't think we should care. The static key enables additional checks
and tweaks but AFAICT none of them requires the SD_ASYM_CPUCAPACITY to
be set and they should all be have no effect if that is the case. I
added the static key just avoid the overhead on systems where they would
have no effect. At least that is intention, I could course have broken
things by mistake.
I tent to agree for misfit but it would be easy to just add an
static_branch_disable_cpuslocked() into the else path of if(enable).
Depending on how we address the exclusive cpuset mess Quentin pointed
out, it isn't as simple as that. As it is with our current
not-yet-posted patches we would never remove the SD_ASYM_CPUCAPACITY
flag, so we would never do a static_branch_disable_cpuslocked().
However, I'm currently thinking that we should probably set the flag
according to the cpus in each root_domain. If we do that, we can't just
disable the static_key because one root_domain is SMP, so we would have
to track if _any_ root_domain (exclusive cpuset topology) has the flag
set.
Is it worth it to iterate over all the exclusive cpuset with all
the locking implications to disable the static_key in the corner case
where exclusive cpuset are set up for all cpu capacities in the system?
I'm tempted to say we shouldn't care in this situation. Setting the
flags correctly in the three cluster example would require knowledge
about the cpuset configuration which we don't have in the arch code so
SD_ASYM_CPUCAPACITY flag detection would have be done by the
sched_domain build code. However, not setting the flag according to the
actual members of the exclusive cpuset means that homogeneous
sched_domains might have SD_ASYM_CPUCAPACITY set enabling potentially
wrong scheduling decisions.
We could easily pass the CPU as an argument to all these
sched_domain_flags_f functions.
-typedef int (*sched_domain_flags_f)(void);
+typedef int (*sched_domain_flags_f)(int cpu);
In this case, the arch specific flag functions on a sched domain (sd) level
could use the corresponding sched_domain_mask_f function to iterate over the
span of the sd seen by CPU instead of all online cpus.
Yeah, but I'm afraid that doesn't solve the problem. The
sched_domain_mask_f function doesn't tell us anything about any
exclusive cpusets. We need sched_domain_span(sd) which is
sched_domain_mask_f & cpu_map where cpu_map is the cpumask of the
exclusive cpuset. So we either need to pass the sched_domain_span() mask
through sched_domain_flags_f or work our way back to that information
based on the cpu id. I'm not sure if the latter is possible.
We could also say that systems with 2 clusters with the same uArch and same
max CPU frequency and additional clusters are insane, like we e.g. do with
the Energy Model and CPUs with different uArch within a frequency domain?
I fail to get the point here. Are you saying that SMP is insane? ;-)
We can actually end up with this problem just by hotplugging too. If you
unplug the entire big cluster in the three cluster example above, you
preserve DIE level which would have SD_ASYM_CPUCAPACITY set even though
we only have little cpus left.
As I see it, we have two choices: 1) Set the flags correctly for
exclusive cpusets which means some additional "fun" in the sched_domain
hierarchy set up, or 2) ignore it and make sure that setting
I assume you refer to this cpu parameter for sched_domain_flags_f under 1).
No, what I had in mind was to let sd_init() set SD_ASYM_CPUCAPACITY
flag as the arch-code doesn't know about the exclusive cpusets as
discussed above. In the meantime I have thought about a different
approach where sd_init() only disables the flag when it is no longer
needed due to exclusive cpusets.