Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

From: Lorenzo Pieralisi
Date: Sat Feb 01 2014 - 10:40:53 EST


On Sat, Feb 01, 2014 at 06:00:40AM +0000, Brown, Len wrote:
> > Right now (on ARM at least but I imagine this is pretty universal), the
> > biggest impact on information accuracy for a CPU depends on what the
> > other CPUs are doing. The most obvious example is cluster power down.
> > For a cluster to be powered down, all the CPUs sharing this cluster must
> > also be powered down. And all those CPUs must have agreed to a possible
> > cluster power down in advance as well. But it is not because an idle
> > CPU has agreed to the extra latency imposed by a cluster power down that
> > the cluster has actually powered down since another CPU in that cluster
> > might still be running, in which case the recorded latency information
> > for that idle CPU would be higher than it would be in practice at that
> > moment.
>
> That will not work.
>
> When a CPU goes idle, it uses the CURRENT criteria for entering that state.
> If the criteria change after it has entered the state, are you going
> to wake it up so it can re-evaluate? No.
>
> That is why the state must describe the worst case latency
> that CPU may see when waking from the state on THAT entry.
>
> That is why we use the package C-state numbers to describe
> core C-states on IA.

That's what we do on ARM too for cluster states. But the state decision
might turn out suboptimal in this case too for the same reasons you have
just mentioned.

There are some use cases when it matters (and where monitoring the
timers on all CPUs in a cluster shows that aborting cluster shutdown is
required because some CPUs have a pending timer and the governor decision is
stale), there are some use cases where it does not matter at all.

We talked about this at LPC and I guess x86 FW/HW plays a role in
package states demotion too, we can do it in FW on ARM.

Overall we all know that whatever we do, it is impossible to know the
precise C-state a CPU is in, even if we resort to HW probing, it is just
a matter of deciding what level of abstraction is necessary for the
scheduler to work properly.

Thanks for bringing this topic up.

Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/