Re: [PATCH v2] cpuset sched_load_balance flag

From: Andrew Morton
Date: Wed Oct 10 2007 - 22:33:46 EST


On Sat, 06 Oct 2007 02:47:47 -0700 Paul Jackson <pj@xxxxxxx> wrote:

> From: Paul Jackson <pj@xxxxxxx>
>
> Add a new per-cpuset flag called 'sched_load_balance'.
>
> When enabled in a cpuset (the default value) it tells the kernel
> scheduler that the scheduler should provide the normal load
> balancing on the CPUs in that cpuset, sometimes moving tasks
> from one CPU to a second CPU if the second CPU is less loaded
> and if that task is allowed to run there.
>
> When disabled (write "0" to the file) then it tells the kernel
> scheduler that load balancing is not required for the CPUs in
> that cpuset.
>
> Now even if this flag is disabled for some cpuset, the kernel
> may still have to load balance some or all the CPUs in that
> cpuset, if some overlapping cpuset has its sched_load_balance
> flag enabled.
>
> If there are some CPUs that are not in any cpuset whose
> sched_load_balance flag is enabled, the kernel scheduler will
> not load balance tasks to those CPUs.
>
> Moreover the kernel will partition the 'sched domains'
> (non-overlapping sets of CPUs over which load balancing is
> attempted) into the finest granularity partition that it can
> find, while still keeping any two CPUs that are in the same
> shed_load_balance enabled cpuset in the same element of the
> partition.
>
> This serves two purposes:
> 1) It provides a mechanism for real time isolation of some CPUs, and
> 2) it can be used to improve performance on systems with many CPUs
> by supporting configurations in which load balancing is not done
> across all CPUs at once, but rather only done in several smaller
> disjoint sets of CPUs.
>
> This mechanism replaces the earlier overloading of the per-cpuset
> flag 'cpu_exclusive', which overloading was removed in an earlier
> patch: cpuset-remove-sched-domain-hooks-from-cpusets
>
> See further the Documentation and comments in the code itself.
>
> ...
>
> +static void rebuild_sched_domains(void)
> +{
> + struct kfifo *q; /* queue of cpusets to be scanned */
> + struct cpuset *cp; /* scans q */
> + struct cpuset **csa; /* array of all cpuset ptrs */
> + int csn; /* how many cpuset ptrs in csa so far */
> + int i, j, k; /* indices for partition finding loops */
> + cpumask_t *doms; /* resulting partition; i.e. sched domains */
> + int ndoms; /* number of sched domains in result */
> + int nslot; /* next empty doms[] cpumask_t slot */
> +
> + q = NULL;
> + csa = NULL;
> + doms = NULL;
> +
> + /* Special case for the 99% of systems with one, full, sched domain */
> + if (is_sched_load_balance(&top_cpuset)) {
> + ndoms = 1;
> + doms = kmalloc(sizeof(cpumask_t), GFP_KERNEL);
> + *doms = top_cpuset.cpus_allowed;

We generally only excuse failure to check kmalloc return value when the
code is called on the bootup path. But this code is called at other times.

>
> static int arch_init_sched_domains(const cpumask_t *cpu_map)
> {
> - cpumask_t cpu_default_map;
> - int err;
> -
> - /*
> - * Setup mask for cpus without special case scheduling requirements.
> - * For now this just excludes isolated cpus, but could be used to
> - * exclude other special cases in the future.
> - */
> - cpus_andnot(cpu_default_map, *cpu_map, cpu_isolated_map);
> + ndoms_cur = 1;
> + doms_cur = kmalloc(sizeof(cpumask_t), GFP_KERNEL);
> + cpus_andnot(*doms_cur, *cpu_map, cpu_isolated_map);

> - err = build_sched_domains(&cpu_default_map);
> -
> - return err;
> + return build_sched_domains(doms_cur);
> }

Ditto


I't s a fairly minor thing really, but children might be watching..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/