Re: problem in changing from active to passive mode

From: Rafael J. Wysocki
Date: Thu Oct 28 2021 - 15:48:37 EST


On Thu, Oct 28, 2021 at 9:25 PM Julia Lawall <julia.lawall@xxxxxxxx> wrote:
>
>
>
> On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
>
> > On Thu, Oct 28, 2021 at 9:13 PM Julia Lawall <julia.lawall@xxxxxxxx> wrote:
> > >
> > >
> > >
> > > On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
> > >
> > > > On Thu, Oct 28, 2021 at 7:57 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@xxxxxxxx> wrote:
> > > > > > >
> > > > > > > > Now, for your graph 3, are you saying this pseudo
> > > > > > > > code of the process is repeatable?:
> > > > > > > >
> > > > > > > > Power up the system, booting kernel 5.9
> > > > > > > > switch to passive/schedutil.
> > > > > > > > wait X minutes for system to settle
> > > > > > > > do benchmark, result ~13 seconds
> > > > > > > > re-boot to kernel 5.15-RC
> > > > > > > > switch to passive/schedutil.
> > > > > > > > wait X minutes for system to settle
> > > > > > > > do benchmark, result ~40 seconds
> > > > > > > > re-boot to kernel 5.9
> > > > > > > > switch to passive/schedutil.
> > > > > > > > wait X minutes for system to settle
> > > > > > > > do benchmark, result ~28 seconds
> > > > > > >
> > > > > > > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > > > > > > register is 0 and in the second boot (after booting 5.15 and entering
> > > > > > > passive mode) it is 10. I don't know though if this is a bug or a
> > > > > > > feature...
> > > > > >
> > > > > > It looks like a bug.
> > > > > >
> > > > > > I think that the desired value is not cleared on driver exit which
> > > > > > should happen. Let me see if I can do a quick patch for that.
> > > > >
> > > > > Please check the behavior with the attached patch applied.
> > > >
> > > > Well, actually, the previous one won't do anything, because the
> > > > desired perf field is already cleared in this function before writing
> > > > the MSR, so please try the one attached to this message instead.
> > > >
> > >
> > > Turbostat still shows 10:
> > >
> > > cpu0: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > cpu0: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > cpu0: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > cpu0: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > cpu1: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > cpu1: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > cpu1: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > cpu1: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > cpu1: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > cpu2: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > cpu2: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > cpu2: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > cpu2: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > cpu2: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > cpu3: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > cpu3: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > cpu3: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > cpu3: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > cpu3: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> >
> > Hmmm.
> >
> > Is this also the case if you go from "passive" to "active" on 5.15-rc
> > w/ the patch applied?
>
> Sorry, I was wrong. If I am in 5.15 and go from passive to active, the
> des field indeed returns to 0. If I use kexec

Well, this means that the cpufreq driver cleanup is not carried out in
the kexec path and the old desired value remains in the register.

> to reboot from 5.15 passive into 5.9, then the des field remains 10.

It looks like desired perf needs to be cleared explicitly in the active mode.

Attached is a patch to do that, but please note that the 5.9 will need
to be patched too to address this issue.
---
drivers/cpufreq/intel_pstate.c | 2 ++
1 file changed, 2 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -946,6 +946,8 @@ static void intel_pstate_hwp_set(unsigne
value &= ~HWP_MAX_PERF(~0L);
value |= HWP_MAX_PERF(max);

+ value &= ~HWP_DESIRED_PERF(~0L);
+
if (cpu_data->epp_policy == cpu_data->policy)
goto skip_epp;