RE: [EXTERNAL] Re: [PATCH v2 2/2] cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems

From: Jones, Morgan
Date: Tue Sep 03 2024 - 17:07:37 EST


Seems like 6.10 is OK but 6.6 is not.

Linux redact 6.10.7 #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan 1 00:00:00 UTC 1980 x86_64 GNU/Linux

analyzing CPU 50:
driver: amd-pstate-epp
CPUs which run at the same hardware frequency: 50
CPUs which need to have their frequency coordinated by software: 50
maximum transition latency: Cannot determine or is not supported.
hardware limits: 400 MHz - 3.35 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 400 MHz and 3.35 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 3.32 GHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: yes
AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.35 GHz.
AMD PSTATE Nominal Performance: 152. Nominal Frequency: 2.00 GHz.
AMD PSTATE Lowest Non-linear Performance: 115. Lowest Non-linear Frequency: 1.51 GHz.
AMD PSTATE Lowest Performance: 31. Lowest Frequency: 400 MHz.


-----Original Message-----
From: Mario Limonciello <mario.limonciello@xxxxxxx>
Sent: Tuesday, September 3, 2024 1:10 PM
To: Jones, Morgan <Morgan.Jones@xxxxxxxxxx>; Dhananjay Ugwekar <Dhananjay.Ugwekar@xxxxxxx>; rafael@xxxxxxxxxx; viresh.kumar@xxxxxxxxxx; gautham.shenoy@xxxxxxx; perry.yuan@xxxxxxx; skhan@xxxxxxxxxxxxxxxxxxx; li.meng@xxxxxxx; ray.huang@xxxxxxx
Cc: linux-pm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; David Arcari <darcari@xxxxxxxxxx>
Subject: Re: [EXTERNAL] Re: [PATCH v2 2/2] cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems

Morgan,

OK that's great news that it's just a backport effort. That same commit also backported to 6.10.3. Can you see if 6.10.y is affected?

Ugwekar,

Any thoughts on what else needs to come back to 6.6.y off hand?

Thanks,

On 9/3/2024 15:07, Jones, Morgan wrote:
> Hey Mario,
>
> Smoking gun here, the max frequency is incorrect on 6.6.44+ but is correct on 6.11.0-rc6.
>
> Linux redact 6.11.0-rc6 #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan 1
> 00:00:00 UTC 1980 x86_64 GNU/Linux
>
> analyzing CPU 12:
> driver: amd-pstate-epp
> CPUs which run at the same hardware frequency: 12
> CPUs which need to have their frequency coordinated by software: 12
> maximum transition latency: Cannot determine or is not supported.
> hardware limits: 400 MHz - 3.35 GHz
> available cpufreq governors: performance powersave
> current policy: frequency should be within 400 MHz and 3.35 GHz.
> The governor "performance" may decide which speed to use
> within this range.
> current CPU frequency: Unable to call hardware
> current CPU frequency: 3.34 GHz (asserted by call to kernel)
> boost state support:
> Supported: yes
> Active: yes
> AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.35 GHz.
> AMD PSTATE Nominal Performance: 152. Nominal Frequency: 2.00 GHz.
> AMD PSTATE Lowest Non-linear Performance: 115. Lowest Non-linear Frequency: 1.51 GHz.
> AMD PSTATE Lowest Performance: 31. Lowest Frequency: 400 MHz.
>
> We're running amd_pstate=active and amd_pstate.shared_mem=1, and our workloads are back to normal performance on 6.11.0-rc6.
>
> Morgan
>
> -----Original Message-----
> From: Mario Limonciello <mario.limonciello@xxxxxxx>
> Sent: Tuesday, September 3, 2024 10:55 AM
> To: Jones, Morgan <Morgan.Jones@xxxxxxxxxx>; Dhananjay Ugwekar
> <Dhananjay.Ugwekar@xxxxxxx>; rafael@xxxxxxxxxx;
> viresh.kumar@xxxxxxxxxx; gautham.shenoy@xxxxxxx; perry.yuan@xxxxxxx;
> skhan@xxxxxxxxxxxxxxxxxxx; li.meng@xxxxxxx; ray.huang@xxxxxxx
> Cc: linux-pm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; David
> Arcari <darcari@xxxxxxxxxx>
> Subject: [EXTERNAL] Re: [PATCH v2 2/2] cpufreq/amd-pstate: Fix the
> scaling_max_freq setting on shared memory CPPC systems
>
> Hi Morgan,
>
> Can you please cross reference 6.11-rc6 to see if you're still having a problem? I would like to understand if we're missing a backport to stable or this is still an upstream problem.
>
> Thanks,
>
> On 9/3/2024 12:51, Jones, Morgan wrote:
>> Hi there,
>>
>> We are experiencing a ~35% performance regression on an AMD EPYC 7702 64 core machine after applying this patch. We observed Linux 6.6.44 starting to cause the issue, and performed a bisect involving rebooting the machine over 15 times. (Note that kexec didn't successfully identify the problem, since the PM memory mailbox is never reset).
>>
>> It appears that we are getting a max of 2.18 GHz when this CPU can boost to 3.35 GHz, explaining the slowdown. Blacklisting amd-pstate solves the issue at the expense of the performance increase we used to get by using it.
>>
>> Is it possible that the upper limits were implicitly at the max CPU power before, and setting the upper limit to something other than the boost frequency can reduce performance now?
>>
>> # bad: [7213910600667c51c978e577bf5454d3f7b313b7] Linux 6.6.44 # good:
>> [58b0425ff5df680d0b67f64ae1f3f1ebdf1c4de9] Linux 6.6.43 git bisect
>> start '7213910600667c51c978e577bf5454d3f7b313b7' '58b0425ff5df680d0b67f64ae1f3f1ebdf1c4de9'
>> # good: [72ff9d26964a3a80f7650df719df139f5c1f965d] arm64: dts: qcom:
>> sm6350: Add missing qcom,non-secure-domain property git bisect good
>> 72ff9d26964a3a80f7650df719df139f5c1f965d
>> # good: [0fffc2e1bf40a2220ef5a38f834ea063dba832d3] ARM: dts: sunxi:
>> remove duplicated entries in makefile git bisect good
>> 0fffc2e1bf40a2220ef5a38f834ea063dba832d3
>> # bad: [8cdbe6ebfd1763a5c41a2a3058497c0a9163311c] pinctrl: renesas:
>> r8a779g0: Fix CANFD5 suffix git bisect bad
>> 8cdbe6ebfd1763a5c41a2a3058497c0a9163311c
>> # bad: [5dbb98e7fa42bebc1325899193d8f13f0705a148] drm/mediatek: Turn
>> off the layers with zero width or height git bisect bad
>> 5dbb98e7fa42bebc1325899193d8f13f0705a148
>> # bad: [691ec7043122c9c8c46d84f6e6cd85d13d50cd93] selftests/bpf: Null
>> checks for links in bpf_tcp_ca git bisect bad
>> 691ec7043122c9c8c46d84f6e6cd85d13d50cd93
>> # bad: [a1359e085d75d7393a250054e66c0a7bc6c3dbfa] perf/x86: Serialize
>> set_attr_rdpmc() git bisect bad
>> a1359e085d75d7393a250054e66c0a7bc6c3dbfa
>> # bad: [e99d9b16ff153de9540073239d24adc3b0a3a997] wifi: ath12k:
>> change DMA direction while mapping reinjected packets git bisect bad
>> e99d9b16ff153de9540073239d24adc3b0a3a997
>> # bad: [d027ac4a08541beb2a89563d3e034da7085050ba] firmware:
>> turris-mox-rwtm: Initialize completion before mailbox git bisect bad
>> d027ac4a08541beb2a89563d3e034da7085050ba
>> # bad: [e6c9eca327e6a41a81e7eba0d0ddc13da37f82a1] ARM: spitz: fix
>> GPIO assignment for backlight git bisect bad
>> e6c9eca327e6a41a81e7eba0d0ddc13da37f82a1
>> # bad: [b8cdefdaa555bbfc269c2198803f8791a8923960] m68k: cmpxchg: Fix
>> return value for default case in __arch_xchg() git bisect bad
>> b8cdefdaa555bbfc269c2198803f8791a8923960
>> # bad: [13a71384ae6a8779da809b00c6f378dcead10427] cpufreq/amd-pstate:
>> Fix the scaling_max_freq setting on shared memory CPPC systems git
>> bisect bad 13a71384ae6a8779da809b00c6f378dcead10427
>> # first bad commit: [13a71384ae6a8779da809b00c6f378dcead10427]
>> cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory
>> CPPC systems
>>
>> cpupower output:
>>
>> analyzing CPU 47:
>> driver: amd-pstate-epp
>> CPUs which run at the same hardware frequency: 47
>> CPUs which need to have their frequency coordinated by software: 47
>> maximum transition latency: Cannot determine or is not supported.
>> hardware limits: 400 MHz - 2.18 GHz
>> available cpufreq governors: performance powersave
>> current policy: frequency should be within 400 MHz and 2.18 GHz.
>> The governor "performance" may decide which speed to use
>> within this range.
>> current CPU frequency: Unable to call hardware
>> current CPU frequency: 2.17 GHz (asserted by call to kernel)
>> boost state support:
>> Supported: yes
>> Active: yes
>> AMD PSTATE Highest Performance: 166. Maximum Frequency: 2.18 GHz.
>> AMD PSTATE Nominal Performance: 152. Nominal Frequency: 2.00 GHz.
>> AMD PSTATE Lowest Non-linear Performance: 115. Lowest Non-linear Frequency: 1.51 GHz.
>> AMD PSTATE Lowest Performance: 31. Lowest Frequency: 400 MHz.
>>
>> Thanks,
>> Morgan
>>
>> -----Original Message-----
>> From: Mario Limonciello <mario.limonciello@xxxxxxx>
>> Sent: Tuesday, July 2, 2024 10:49 AM
>> To: Dhananjay Ugwekar <Dhananjay.Ugwekar@xxxxxxx>; rafael@xxxxxxxxxx;
>> viresh.kumar@xxxxxxxxxx; gautham.shenoy@xxxxxxx; perry.yuan@xxxxxxx;
>> skhan@xxxxxxxxxxxxxxxxxxx; li.meng@xxxxxxx; ray.huang@xxxxxxx
>> Cc: linux-pm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; David
>> Arcari <darcari@xxxxxxxxxx>
>> Subject: Re: [PATCH v2 2/2] cpufreq/amd-pstate: Fix the
>> scaling_max_freq setting on shared memory CPPC systems
>>
>> On 7/2/2024 3:14, Dhananjay Ugwekar wrote:
>>> On shared memory CPPC systems, with amd_pstate=active mode, the
>>> change in scaling_max_freq doesn't get written to the shared memory region.
>>> Due to this, the writes to the scaling_max_freq sysfs file don't
>>> take effect. Fix this by propagating the scaling_max_freq changes to
>>> the shared memory region.
>>>
>>> Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP
>>> support for the AMD processors")
>>> Reported-by: David Arcari <darcari@xxxxxxxxxx>
>>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@xxxxxxx>
>> Reviewed-by: Mario Limonciello <mario.limonciello@xxxxxxx>
>>> ---
>>> drivers/cpufreq/amd-pstate.c | 43 +++++++++++++++++++-----------------
>>> 1 file changed, 23 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/cpufreq/amd-pstate.c
>>> b/drivers/cpufreq/amd-pstate.c index 9ad62dbe8bfb..a092b13ffbc2
>>> 100644
>>> --- a/drivers/cpufreq/amd-pstate.c
>>> +++ b/drivers/cpufreq/amd-pstate.c
>>> @@ -247,6 +247,26 @@ static int amd_pstate_get_energy_pref_index(struct amd_cpudata *cpudata)
>>> return index;
>>> }
>>>
>>> +static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
>>> + u32 des_perf, u32 max_perf, bool fast_switch) {
>>> + if (fast_switch)
>>> + wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached));
>>> + else
>>> + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
>>> + READ_ONCE(cpudata->cppc_req_cached));
>>> +}
>>> +
>>> +DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
>>> +
>>> +static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
>>> + u32 min_perf, u32 des_perf,
>>> + u32 max_perf, bool fast_switch) {
>>> + static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf,
>>> + max_perf, fast_switch);
>>> +}
>>> +
>>> static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp)
>>> {
>>> int ret;
>>> @@ -263,6 +283,9 @@ static int amd_pstate_set_epp(struct amd_cpudata *cpudata, u32 epp)
>>> if (!ret)
>>> cpudata->epp_cached = epp;
>>> } else {
>>> + amd_pstate_update_perf(cpudata, cpudata->min_limit_perf, 0U,
>>> + cpudata->max_limit_perf, false);
>>> +
>>> perf_ctrls.energy_perf = epp;
>>> ret = cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
>>> if (ret) {
>>> @@ -452,16 +475,6 @@ static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
>>> return static_call(amd_pstate_init_perf)(cpudata);
>>> }
>>>
>>> -static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
>>> - u32 des_perf, u32 max_perf, bool fast_switch)
>>> -{
>>> - if (fast_switch)
>>> - wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached));
>>> - else
>>> - wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ,
>>> - READ_ONCE(cpudata->cppc_req_cached));
>>> -}
>>> -
>>> static void cppc_update_perf(struct amd_cpudata *cpudata,
>>> u32 min_perf, u32 des_perf,
>>> u32 max_perf, bool fast_switch) @@ -475,16 +488,6 @@
>>> static void cppc_update_perf(struct amd_cpudata *cpudata,
>>> cppc_set_perf(cpudata->cpu, &perf_ctrls);
>>> }
>>>
>>> -DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
>>> -
>>> -static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
>>> - u32 min_perf, u32 des_perf,
>>> - u32 max_perf, bool fast_switch)
>>> -{
>>> - static_call(amd_pstate_update_perf)(cpudata, min_perf, des_perf,
>>> - max_perf, fast_switch);
>>> -}
>>> -
>>> static inline bool amd_pstate_sample(struct amd_cpudata *cpudata)
>>> {
>>> u64 aperf, mperf, tsc;
>>
>>
>