Re: RT sched: cpupri_vec lock contention with def_root_domain and no load balance

From: Dimitri Sivanich
Date: Fri Nov 21 2008 - 16:18:21 EST


Hi Greg and Max,

On Fri, Nov 21, 2008 at 12:04:25PM -0800, Max Krasnyansky wrote:
> Hi Greg,
>
> I attached debug instrumentation patch for Dmitri to try. I'll clean it up and
> add things you requested and will resubmit properly some time next week.
>

We added Max's debug patch to our kernel and have run Max's Trace 3 scenario, but we do not see a NULL sched-domain remain attached, see my comments below.


mount -t cgroup cpuset -ocpuset /cpusets/

for i in 0 1 2 3; do mkdir par$i; echo $i > par$i/cpuset.cpus; done

kernel: cpusets: rebuild ndoms 1
kernel: cpuset: domain 0 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpusets: rebuild ndoms 1
kernel: cpuset: domain 0 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpusets: rebuild ndoms 1
kernel: cpuset: domain 0 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpusets: rebuild ndoms 1
kernel: cpuset: domain 0 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0

echo 0 > cpuset.sched_load_balance
kernel: cpusets: rebuild ndoms 4
kernel: cpuset: domain 0 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpuset: domain 1 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpuset: domain 2 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpuset: domain 3 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: CPU0 root domain default
kernel: CPU0 attaching NULL sched-domain.
kernel: CPU1 root domain default
kernel: CPU1 attaching NULL sched-domain.
kernel: CPU2 root domain default
kernel: CPU2 attaching NULL sched-domain.
kernel: CPU3 root domain default
kernel: CPU3 attaching NULL sched-domain.
kernel: CPU3 root domain e0000069ecb20000
kernel: CPU3 attaching sched-domain:
kernel: domain 0: span 3 level NODE
kernel: groups: 3
kernel: CPU2 root domain e000006884a00000
kernel: CPU2 attaching sched-domain:
kernel: domain 0: span 2 level NODE
kernel: groups: 2
kernel: CPU1 root domain e000006884a20000
kernel: CPU1 attaching sched-domain:
kernel: domain 0: span 1 level NODE
kernel: groups: 1
kernel: CPU0 root domain e000006884a40000
kernel: CPU0 attaching sched-domain:
kernel: domain 0: span 0 level NODE
kernel: groups: 0


Which is the way sched_load_balance is supposed to work. You need to set sched_load_balance=0 for all cpusets containing any cpu you want to disable balancing on, otherwise some balancing will happen.

So in addition to the top (root) cpuset, we need to set it to '0' in the parX cpusets. That will turn off load balancing to the cpus in question (thereby attaching a NULL sched domain). So when we do that for just par3, we get the following:

echo 0 > par3/cpuset.sched_load_balance
kernel: cpusets: rebuild ndoms 3
kernel: cpuset: domain 0 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpuset: domain 1 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: cpuset: domain 2 cpumask
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0
0000000,00000000,00000000,00000000,0
kernel: CPU3 root domain default
kernel: CPU3 attaching NULL sched-domain.

So the def_root_domain is now attached for CPU 3. And we do have a NULL sched-domain, which we expect for a cpu with load balancing turned off. If we turn sched_load_balance off ('0') on each of the other cpusets (par0-2), each of those cpus would also have a NULL sched-domain attached.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/