Re: [PATCH v2] sched: keep quiescent cpu out of idle balance loop

From: Lei Wen
Date: Fri Feb 21 2014 - 02:28:59 EST


Mike,

On Fri, Feb 21, 2014 at 1:51 PM, Mike Galbraith <bitbucket@xxxxxxxxx> wrote:
> On Fri, 2014-02-21 at 10:23 +0800, Lei Wen wrote:
>> Cpu which is put into quiescent mode, would remove itself
>> from kernel's sched_domain, and want others not disturb its
>> task running. But current scheduler would not checking whether
>> that cpu is setting in such mode, and still insist the quiescent
>> cpu to response the nohz load balance.
>
> Let's isolate some CPUs.
>
> Setup:
> /-----"system" CPU0
> \
> \---"rtcpus" CPUs1-3
>
> crash> runqueues
> PER-CPU DATA TYPE:
> struct rq runqueues;
> PER-CPU ADDRESSES:
> [0]: ffff88022fc12c00
> [1]: ffff88022fc92c00
> [2]: ffff88022fd12c00
> [3]: ffff88022fd92c00
> crash> struct rq ffff88022fd92c00 | grep sd
> sd = 0x0, <== yup, CPU3 is isolated bit of silicon
> crash> struct rq ffff88022fd92c00 | grep rd
> rd = 0xffffffff81bffe60 <def_root_domain>,
> crash> struct -x root_domain 0xffffffff81bffe60
> ...
> span = {{
> bits = {0xe} <== "rtcpus"
> }},
> crash> struct rq ffff88022fc12c00 | grep rd
> rd = 0xffff8802242c5800,
> crash> struct -x root_domain 0xffff8802242c5800
> ...
> span = {{
> bits = {0x1} <== "system"
> }},
>
> Turn off load balancing in "system" as well now, CPU0 loses its 'sd',
> and becomes an isolated island identical to "rtcpus" CPUs, and thus..
>
> span = {{
> bits = {0xf} <== oh darn
> }},
>
> .."system" and "rtcpus" merge, all CPUs having NULL sd, as they are now
> all remote silicon islands, but they now also share rd again, as if you
> had never diddled domains in the first place.

Great catch for it!
Actually, what I have experiment is as:
1. set top cpuset as disable load balance
2. set 0-2 cpus to "system", and enable its load balance
3. set 3 cpu to "rt" and disable load balance.

While by this way, root span always covering [0-2] which is seen
by cpu 0-2, as you also mentioned.
And it is true that if I disable load balance, I would see span mask
get them merged.

So how about below change?
+ if (!this_rq()->sd)
+ return;
Suppose isolated cpu would lose its sd, could you help
confirm it from crash too?

Or, you think it is wrong to do merge job when system group disable
the load balance?

Thanks,
Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/