RE: [PATCH] [v4] x86, suspend: Save/restore extra MSR registers for suspend

From: Chen, Yu C
Date: Thu Nov 12 2015 - 04:42:29 EST


Hi,
> -----Original Message-----
> From: Doug Smythies [mailto:dsmythies@xxxxxxxxx]
> Sent: Friday, November 06, 2015 11:34 PM
> To: Chen, Yu C
> Cc: Wysocki, Rafael J; tglx@xxxxxxxxxxxxx; hpa@xxxxxxxxx; bp@xxxxxxxxx;
> Zhang, Rui; linux-pm@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Brown, Len; 'Ingo Molnar'; 'Pavel Machek'; 'Kristen
> Carlson Accardi'; Pandruvada, Srinivas
> Subject: RE: [PATCH] [v4] x86, suspend: Save/restore extra MSR registers for
> suspend
>
>
> On 2015.11.01 08:50 Chen, Yu C wrote:
> >> On 2015.10.10 19:27 Chen, Yu C wrote:
> >>> On 2105.10.10 02:56 Doug Smythies wrote:
> >>>
> >>>>> The current version of the intel_pstate driver is incompatible
> >>>>> with any use of Clock Modulation, always resulting in driving the
> >>>>> target pstate to the minimum, regardless of load. The result is
> >>>>> the apparent CPU frequency stuck at minimum * modulation percent.
> >>>>
> >>>>> The acpi-cpufreq driver works fine with Clock Modulation,
> >>>>> resulting in desired frequency * modulation percent.
> >>>>
> >>
> >>> [Yu] Why intel_pstate driver is incompatible with Clock Modulation?
> >>
> >> It is simply how the current control algorithm responds to the scenario.
> >>
> >> The problem is in intel_pstate_get_scaled_busy, here:
> >>
> >> /*
> >> * core_busy is the ratio of actual performance to max
> >> * max_pstate is the max non turbo pstate available
> >> * current_pstate was the pstate that was requested during
> >> * the last sample period.
> >> *
> >> * We normalize core_busy, which was our actual percent
> >> * performance to what we requested during the last sample
> >> * period. The result will be a percentage of busy at a
> >> * specified pstate.
> >> */
> >> core_busy = cpu->sample.core_pct_busy;
> >> max_pstate = int_tofp(cpu->pstate.max_pstate);
> >> current_pstate = int_tofp(cpu->pstate.current_pstate);
> >> core_busy = mul_fp(core_busy, div_fp(max_pstate,
> >> current_pstate));
> >>
> >> With Clock Modulation enabled, the actual performance percent will
> >> always be less than what was asked for, basically meaning
> >> current_pstate is much less than what was asked for. Thus the
> >> algorithm will drive down the target pstate regardless of load.
> >>
> > [Yu] Do you mean, there is some problem with the normalization,and we
> > should use the actual pstate rather than the theoretical
> > current_pstate, for example, the pseudocode might looked like:
> >
> > - current_pstate = int_tofp(cpu->pstate.current_pstate);
> > + current_pstate = int_tofp(cpu->pstate.current_pstat)*0.85;
>
> I did not think of normalizing / compensating at this point.
> That is a good idea.
> Just for a test, I tried it and it seems to work well.
> Before normalizing / compensating core_busy can be quite a small for lesser
> clock modulation duty cycles, and so becomes a little noisy afterwards.
>
> For my test, on an otherwise unaltered kernel v4.3 I did this:
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index aa33b92..97a90e1 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -821,6 +821,7 @@ static inline int32_t
> intel_pstate_get_scaled_busy(struct cpudata *cpu)
> int32_t core_busy, max_pstate, current_pstate, sample_ratio;
> s64 duration_us;
> u32 sample_time;
> + u64 clock_modulation;
>
> /*
> * core_busy is the ratio of actual performance to max @@ -836,6
> +837,17 @@ static inline int32_t intel_pstate_get_scaled_busy(struct
> cpudata *cpu)
> core_busy = cpu->sample.core_pct_busy;
> max_pstate = int_tofp(cpu->pstate.max_pstate);
> current_pstate = int_tofp(cpu->pstate.current_pstate);
> +
> +// rdmsrl(MSR_IA32_CLOCK_MODULATION, clock_modulation);
> + rdmsrl(MSR_IA32_THERM_CONTROL, clock_modulation);
> + if(clock_modulation && 0X10) {
> + clock_modulation = clock_modulation & 0x0F;
> + if(clock_modulation == 0) clock_modulation = 8;
> + core_busy = mul_fp(core_busy, int_tofp(0x10));
> + core_busy = div_fp(core_busy, int_tofp(clock_modulation));
> + }
> +
rdmsr_safe might be better, you can refer to acpi_throttling_rdmsr ,
and I'm OK with this code, are you planning to send a formal patch?

Yu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/