RE: Performance regression in v3.14

From: Doug Smythies
Date: Wed May 28 2014 - 12:06:55 EST



On 2014.05.27 01:40 Yuyang Du wrote:
>> On 2014.05.27 01:00, Johan Hovold wrote:
>> I tried applying your (rejected) patch "intel_pstate: Remove C0
>> tracking" posted here:
>>
>> https://lkml.org/lkml/2014/5/8/574
>>
>> to v3.14.4 and it fixes the problem as expected.
>>
>> So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into
>> account for core busy calculation") that went into v3.14-rc2 (and was
>> even marked for *stable*) that first broke Greg KH's system:
>>
>> https://lkml.org/lkml/2014/2/19/626
>>
>> That was apparently fixed by e66c17683746 ("intel_pstate: Change
>> busy calculation to use fixed point math."), but still left v3.14
>> basically unusable for lower-intensity workloads such as my
>> bash-completion example and other reported regressions:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=75121
>>
>> Sure there may be issues with v3.13 not hitting the lowest frequencies
>> but at least the system was *usable*.
>>
>> In my opinion there's really no other option than to restore the 3.13
>> behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take
>> core C0 time into account for core busy calculation") until you have
>> figured out a way to take C0 into account without breaking things too
>> badly.

> Hi all,

> My posts before and now are only relevant to why C0 tracking can't be
> removed. Maybe I need to elaborate on it a little bit more.

> In a nutshell, without C0 tracking, the intel_pstate is effectively
> performance governor in terms of frequency control.

That is not true. The CPU Frequency Verses Load response curve
is different and considerably more aggressive for performance mode
when compared to powersave mode with C0 tracking removed.

I'll add a relevant graph to the bugzilla report referenced above.
(but it will be a few hours before I do.)

> Why? Without C0 trakcing, the machinery of the freq control
> is as I formed:
> last_freq_average / last_requested_freq ==> setpoint
> which can be virtually formed into:
> last_freq_average / last_requested_freq * last_C0_pct ==>
> setpoint * last_C0_pct

> which said, the control machinery will increase the frequency
> at ANY frequency at ANY C0_pct (which is the CPU utilization),
> since setpoint is less then 100 percent.

That is not true. Yes, and due to the setpoint being less than
100, which is needed or the driver won't work at all, there is
a tendency to drive the target pstate upwards.
However that is tempered by both the PID proportional gain,
and ultimately integer math. More importantly, the CPU
itself tells the driver when it is operating below the target
pstate and driver responds.

Additionally, the tendency to drive up the target pstate
too much is exasperated by some extra rounding up at a
couple of spots. Dirk has a pending fix.

> And a few iterations
> later, we will reach max (possible) frequency,
> then we are effectively performance governor
> (highest frequency all the time).

Please do not confuse highest target pstate with
highest frequency. They are not the same. The processor
itself can back off.

... Doug


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/