Re: [RFC PATCHC 0/3] sched/idle : find the idlest cpu with cpuidle info

From: Rafael J. Wysocki
Date: Fri Apr 04 2014 - 07:08:13 EST

On Wednesday, April 02, 2014 10:26:31 AM Daniel Lezcano wrote:
> On 04/02/2014 01:01 AM, Rafael J. Wysocki wrote:
> > On Friday, March 28, 2014 01:29:53 PM Daniel Lezcano wrote:
> >> The following patchset provides an interaction between cpuidle and the scheduler.
> >>
> >> The first patch encapsulate the needed information for the scheduler in a
> >> separate cpuidle structure. The second one stores the pointer to this structure
> >> when entering idle. The third one, use this information to take the decision to
> >> find the idlest cpu.
> >>
> >> After some basic testing with hackbench, it appears there is an improvement for
> >> the performances (small) and for the duration of the idle states (which provides
> >> a better power saving).
> >>
> >> The measurement has been done with the 'idlestat' tool previously posted in this
> >> mailing list.
> >>
> >> So the benefit is good for both sides performance and power saving.
> >>
> >> The select_idle_sibling could be also improved in the same way.
> >
> > Well, quite frankly, I don't really like this series. Not the idea itself, but
> > the way it has been implemented.
> >
> > First off, if the scheduler is to access idle state data stored in struct
> > cpuidle_state, I'm not sure why we need a separate new structure for that?
> > Couldn't there be a pointer to a whole struct cpuidle_state from struct rq
> > instead? [->exit_latency is the only field that find_idlest_cpu() in your
> > third patch seems to be using anyway.]
> Hi Rafael,
> thank you very much for reviewing the patchset.
> I created a specific structure to encapsulate the informations needed
> for the scheduler and to prevent to export unneeded data. This is purely
> for code design. Also it was to separate the idle's energy
> characteristics from the cpuidle framework data (flags, name, etc ...).
> The exit_latency field is only used in this patchset but the
> target_residency will be used also (eg. prevent to wakeup a cpu before
> the minimum idle time target residency).


It would be good to add that heuristics upfront so that we can see the full

> The power field is ... hum ... not filled by any board (except for
> calxeda). Vendors do not like to share this information, so very likely
> that would be changed to a normalized value, I don't know.

I'm not sure if that field is ever going to be used by everyone to be honest.

> I agree we can put a pointer to the struct cpuidle_state instead if that
> reduce the impact of the patchset.

Yes, it will, in my opinion.

> > Second, is accessing the idle state information for all CPUs from find_idlest_cpu()
> > guaranteed to be non-racy? I mean, what if a CPU changes its state from idle to
> > non-idle while another one is executing find_idlest_cpu()? In other words,
> > where's the read memory barrier corresponding to the write ones in the modified
> > cpu_idle_call()? And is the memory barrier actually sufficient? After all,
> > you need to guarantee that the CPU is still idle after you have evaluated
> > idle_cpu() on it.
> Well, as Nicolas mentioned it in another mail, we can live with races,
> the scheduler will take a wrong decision but nothing worth than what we

I guess you mean "worse"? I'm not sure about that.

> have today. In any case we want to prevent any lock in the code.

Of course. :-)

> > Finally, is really the heuristics used by find_idlest_cpu() to select the "idlest"
> > CPU the best one? What about deeper vs shallower idle states, for example?
> I believe it is what is supposed to do the patchset. 1. if the cpu is
> idle, pick the shallower, 2. if the cpu is not idle pick the less
> loaded. But may be there is something wrong in the routine as pointed
> Nico, I have to double check it.

Yes, that routine doesn't look entirely correct then.


I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at