Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

From: Yuyang Du
Date: Fri Jun 06 2014 - 04:39:29 EST

On Fri, Jun 06, 2014 at 10:05:43AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 06, 2014 at 04:29:30AM +0800, Yuyang Du wrote:
> > On Thu, Jun 05, 2014 at 08:03:15AM -0700, Dirk Brandewie wrote:
> > >
> > > You can request a P state per core but the package does coordination at
> > > a package level for the P state that will be used based on all requests.
> > > This is due to the fact that most SKUs have a single VR and PLL. So
> > > the highest P state wins. When a core goes idle it loses it's vote
> > > for the current package P state and that cores clock it turned off.
> > >
> >
> > You need to differentiate Turbo and non-Turbo. The highest P state wins? Not
> > really.
> *sigh* and here we go again.. someone please, write something coherent
> and have all intel people sign off on it and stop saying different
> things.
> > Actually, silicon supports indepdent non-Turbo pstate, but just not enabled.
> Then it doesn't exist, so no point in mentioning it.

Well, things actually get more complicated. Not-enabled is for Core. For Atom
Baytrail, each core indeed can operate on difference frequency. I am not sure for
Xeon, :)

> > For Turbo, it basically depends on power budget of both core and gfx (because
> > they share) for each core to get which Turbo point.
> And RAPL controls can give preference of which gfx/core gets most,
> right?

Maybe Jacob knows that.

> > > intel_pstate tries to keep the core P state as low as possible to satisfy
> > > the given load, so when various cores go idle the package P state can be
> > > as low as possible. The big power win is a core going idle.
> > >
> >
> > In terms of prediction, it is definitely can't be 100% right. But the
> > performance of most workloads does scale with pstate (frequency), may not be
> > linearly. So it is to some point predictable FWIW. And this is all governors
> > and Intel_pstate's basic assumption.
> So frequency isn't _that_ interesting, voltage is. And while
> predictability it might be their assumption, is it actually true? I
> mean, there's really nothing else except to assume that, if its not you
> can't do anything at all, so you _have_ to assume this.
> But again, is the assumption true? Or just happy thoughts in an attempt
> to do something.

Voltage is combined with frequency, roughly, voltage is proportional to freuquecy, so
roughly, power is proportionaly to voltage^3. You can't say which is more important,
or there is no reason to raise voltage without raising frequency.

If only one word to say: true of false, it is true. Because given any fixed
workload, I can't see why performance would be worse if frequency is higher.

The reality as opposed to the assumption is in two-fold:
1) if workload is CPU bound, performance scales with frequency absolutely. if workload is
memory bound, it does not scale. But from kernel, we don't know whether it is CPU bound
or not (or it is hard to know). uArch statistics can model that.
2) the workload is not fixed in real-time, changing all the time.

But still, the assumption is a must or no guilty, because we adjust frequency continuously,
for example, if the workload is fixed, and if the performance does not scale with freq we stop
increasing frequency. So a good frequency governor or driver should and can continuously
pursue "good" frequency with the changing workload. Therefore, in the long term, we will be
better off.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at