Re: [RFC] sched: CPU topology try

From: Preeti U Murthy
Date: Tue Jan 07 2014 - 11:55:24 EST

On 01/07/2014 06:01 PM, Vincent Guittot wrote:
> On 7 January 2014 11:39, Preeti U Murthy <preeti@xxxxxxxxxxxxxxxxxx> wrote:
>> On 01/07/2014 03:20 PM, Peter Zijlstra wrote:
>>> On Tue, Jan 07, 2014 at 03:10:21PM +0530, Preeti U Murthy wrote:
>>>> What if we want to add arch specific flags to the NUMA domain? Currently
>>>> with Peter's patch: and this patch,
>>>> the arch can modify the sd flags of the topology levels till just before
>>>> the NUMA domain. In sd_init_numa(), the flags for the NUMA domain get
>>>> initialized. We need to perhaps call into arch here to probe for
>>>> additional flags?
>>> What are you thinking of? I was hoping all NUMA details were captured in
>>> the distance table.
>>> Its far easier to talk of specifics in this case.
>> If the processor can be core gated, then there is very little power
>> savings that we could yield from consolidating all the load onto a
>> single node in a NUMA domain. 6 cores on one node or 3 cores each on two
>> nodes, the power is drawn by 6 cores in all. So I was thinking under
>> this circumstance we might want to set the SD_SHARE_POWERDOMAIN flag at
>> the NUMA domain and spread the load if it favours the workload.
> The policy of keeping the tasks running on cores that are close (same
> node) to the memory, is the more power efficient, isn't it ? so it's
> probably more about where to place the memory than about where to
> place the tasks ?

Yes this is another point. One of the reasons that we try to consolidate
load to cores is that on Power8 systems most of the power management is
at the core level and node level cpuidle states are usually entered into
on fully idle systems due to the overhead involved in exit from these
idle states as I mentioned in reply to this thread.

Another point questioning node level idle states which could for
instance include flushing of large shared cache is that if we try and
consolidate the load to nodes, we must also consolidate memory pages
simultaneously. Else the performance will be severely hurt in
re-fetching the pages which were flushed as compared to core level idle
Core level idle power management could include flushing of l2 cache,
which is still ok for performance because re-fetching of the pages on
this cache has relatively low overhead and depending on the arch, the
power savings obtained could be worth the overhead.


Preeti U Murthy
> Vincent
>> Regards
>> Preeti U Murthy

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at