RE: [PATCH] [v4] x86, suspend: Save/restore extra MSR registers for suspend

From: Doug Smythies
Date: Fri Oct 09 2015 - 14:55:53 EST


Note: This reply is more for general information than anything else.

Since Yu's original e-mail I have searched all forums and bug reports
that I would find in an attempt to gain knowledge as to the extent
of Clock Modulation becoming enabled issues.

It is common, but particularly with some types of Dell LapTops
upon resume from suspend on battery power.

For the most part, users simply disable the intel_pstate driver, and
move on with their lives using the acpi-cpufreq driver. And once they
move on we never hear from 90% of them again, even though they agree
to do more tests and report back.

Recall my original reply to this thread, a portion copied below:

> The current version of the intel_pstate driver is incompatible with
> any use of Clock Modulation, always resulting in driving the target
> pstate to the minimum, regardless of load. The result is the apparent
> CPU frequency stuck at minimum * modulation percent.

> The acpi-cpufreq driver works fine with Clock Modulation, resulting in
> desired frequency * modulation percent.

For the most part users don't even notice the reduced performance
(which is typically 75 or 87.5 percent of normal) when using the
acpi-cpufreq driver.

Yu's specific example is special, because the Clock Modulation
percent is undefined (reserved), and thus clearly unintentional,
or just plain wrong if it is intentional.

Other comments in-line below:

> On 2015.10.07 02:39 Chen, Yu C wrote:
> On 2105.09.17 13:30 Pavel Machek wrote:
>> On Thu 2015-08-27 11:18:27, Chen Yu wrote:
>>> A bug is reported(https://bugzilla.redhat.com/show_bug.cgi?id=1227208)
>>> that, after resumed from S3, CPU is running at a low speed.
>>> After investigation, it is found that, BIOS has modified the value of
>>> THERM_CONTROL register during S3, and changes it from 0 to 0x10, since
>>> value of 0x10 means CPU can only get 25% of the Duty Cycle, this
>>> triggers the problem.

Note: THERM_CONTROL register = IA32_CLOCK_MODULATION

I suspect the outcome for an undefined Clock Modulation value of 0
is processor dependent. For example on my i7-2600K I get 50% duty cycle
if I manually write 0x10 to the register.

>>>
>>> Here is a simple scenario to reproduce the issue:
>>> 1.Boot up the system
>>> 2.Get MSR with address 0x19a, it should be 0

It doesn't have to be, and in some cases I found it isn't,
but Clock Modulation isn't enabled.
Only bit 4 matters.

>>> 3.Put the system into
>>> sleep, then wake it up 4.Get MSR with address 0x19a, it should be
>>> 0(actually it shows 0x10)

Again, it doesn't have to be 0, only bit 4 matters.

In the problem cases I found, typically the value is 0x1c or 0x1e,
for Clock Modulation percentages of 75% or 87.5%

However, thanks very much for the "how to" I used it a lot.

>>>
>>> Although this is a BIOS issue,

In your specific case, and since the register value is undefined yes.
In the resume from suspend on battery power case, it might
be intentional. (there has been no response from Dell on the Dell
forum where I asked.)

>>> it would be more robust for linux to
>>> deal with this situation. This patch fixes this issue by introducing a
>>> framework to save/restore specified MSR registers(THERM_CONTROL in
>>> this case) for suspend/resume.
>>>
>>> When user encounters a problematic platform and needs to protect the
>>> MSRs during suspending, he can simply add a quirk entry in
>>> msr_save_dmi_table, and customizes MSR registers inside the quirk
>>> callback, for example:

This might be hard to maintain and cause significant delays for actual end
user availability for these battery resume type cases.

Is there a point to be made here?:

Yes, I think the intel_pstate driver should be improved. Currently it is overly
sensitive to non-standard system perturbations, sometimes resulting in driving
down the CPU frequency, as in this case, and sometimes driving up the CPU frequency
unnecessarily. I am saying the driver doesn't pass sensitivity analysis well.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/