Re: [REGRESSION] amd-pstate doesn't work since v5.18.11

From: Mario Limonciello
Date: Tue Jul 12 2022 - 23:10:21 EST


On 7/12/22 21:40, Yuan, Perry wrote:
[AMD Official Use Only - General]

Hi Mario.

-----Original Message-----
From: Limonciello, Mario <Mario.Limonciello@xxxxxxx>
Sent: Wednesday, July 13, 2022 4:07 AM
To: Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx>; Yuan, Perry
<Perry.Yuan@xxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; Huang, Ray
<Ray.Huang@xxxxxxx>
Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>; Sasha Levin
<sashal@xxxxxxxxxx>; x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>;
Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [REGRESSION] amd-pstate doesn't work since v5.18.11

On 7/12/2022 12:54, Oleksandr Natalenko wrote:
Hello.

On úterý 12. července 2022 19:50:33 CEST Limonciello, Mario wrote:
[Public]

+ Ray

-----Original Message-----
From: Yuan, Perry <Perry.Yuan@xxxxxxx>
Sent: Tuesday, July 12, 2022 12:50
To: Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx>; Limonciello,
Mario <Mario.Limonciello@xxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx
Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>; Sasha Levin
<sashal@xxxxxxxxxx>; x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>;
Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [REGRESSION] amd-pstate doesn't work since v5.18.11

[AMD Official Use Only - General]

Hi Oleksandr:

-----Original Message-----
From: Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx>
Sent: Wednesday, July 13, 2022 1:40 AM
To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>; linux-
kernel@xxxxxxxxxxxxxxx
Cc: Yuan, Perry <Perry.Yuan@xxxxxxx>; Rafael J. Wysocki
<rafael.j.wysocki@xxxxxxxxx>; Sasha Levin <sashal@xxxxxxxxxx>;
x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx>
Subject: [REGRESSION] amd-pstate doesn't work since v5.18.11

[CAUTION: External Email]

Hello Mario.

The following commits were pulled into v5.18.11:

```
$ git log --oneline --no-merges v5.18.10..v5.18.11 | grep ACPI
2783414e6ef7 ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is
supported
3068cfeca3b5 ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked
8beb71759cc8 ACPI: bus: Set CPPC _OSC bits for all and when
CPPC_LIB is supported
13bb696dd2f3 ACPI: CPPC: Check _OSC for flexible address space ```

and now this happens:

```
$ sudo modprobe amd-pstate shared_mem=1
modprobe: ERROR: could not insert 'amd_pstate': No such device ```

With v5.18.10 this worked just fine.

In your upstream commit
8b356e536e69f3a4d6778ae9f0858a1beadabb1f
you write:

```
If there is additional breakage on the shared memory designs also
missing this _OSC, additional follow up changes may be needed.
```

So the question is what else should be pulled into the stable tree
to unbreak amd-pstate?

Thanks.

--
Oleksandr Natalenko (post-factum)


Could you share the lscpu output ?

Here's my `lscpu`:

```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 3900XT 12-Core Processor
CPU family: 23
Model: 113
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 59%
CPU max MHz: 3800,0000
CPU min MHz: 2200,0000
BogoMIPS: 7589.71
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid
aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic
movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce
topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3
hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm
rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves
cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf
xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale
vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic
v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca
sev sev_es
Virtualization: AMD-V
L1d cache: 384 KiB (12 instances)
L1i cache: 384 KiB (12 instances)
L2 cache: 6 MiB (12 instances)
L3 cache: 64 MiB (4 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-23
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass
disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and
__user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP
conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

```

Perry.

Thanks this is the sort of thing I was worried might happen as a
result of requiring the _OSC. It was introduced as part of that commit
8beb71759cc8.

To solve it I think we need to add more things to
cpc_supported_by_cpu
(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi

thub.com%2Ftorvalds%2Flinux%2Fblob%2F525496a030de4ae64bb9e1d6bfc8
8eec

6f5fe6e2%2Farch%2Fx86%2Fkernel%2Facpi%2Fcppc.c%23L19&amp;data=05
%7C01
%7CMario.Limonciello%40amd.com%7C96addaab0edc4e22779908da642f
84ac%7C3

dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637932453099304670
%7CUnknow

n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
WwiL

CJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=4KHD3UUlfDJEmpTpqDC
muV1x%2F7n
j%2F0iuhwdnhJqtQeU%3D&amp;reserved=0)

The question is how do we safely detect the shared memory designs?
These are a fixed quantity as newer designs /should/ be using the MSR.

I am tending to thing that unfortunately we need to have an
allow-list of shared memory design here unless someone has other ideas.

Happy to test any patches as needed.


See if this helps out:

diff --git a/arch/x86/kernel/acpi/cppc.c b/arch/x86/kernel/acpi/cppc.c index
734b96454896..88a81e6b9228 100644
--- a/arch/x86/kernel/acpi/cppc.c
+++ b/arch/x86/kernel/acpi/cppc.c
@@ -16,6 +16,13 @@ bool cpc_supported_by_cpu(void)
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
case X86_VENDOR_HYGON:
+ if (boot_cpu_data.x86 == 0x19 &&
+ ((boot_cpu_data.x86_model >= 0x00 &&
boot_cpu_data.x86_model <= 0x0f) ||
+ (boot_cpu_data.x86_model >= 0x20 &&
boot_cpu_data.x86_model <= 0x2f)))
+ return true;
+ else if (boot_cpu_data.x86 == 0x17 &&
+ boot_cpu_data.x86_model >= 0x70 &&
boot_cpu_data.x86_model <= 0x7f)
+ return true;
return boot_cpu_has(X86_FEATURE_CPPC);
}
return false;

If that works and no one has a better idea how to do it for these systems I'll
send out a proper proper patch tomorrow.

This could be a short-term solution, I would prefer to add CPU Ids check and we can maintain that list for
all the model info including MSRs and Shared mem types.

What's longer term solution when it comes to shared mem? It seems like it's either a list of IDs or a heuristic. Given it's a fixed list and new designs take MSR, I would think the list of IDs is preferable.

Furthermore; I would argue that if there was another design introduced for some reason that takes shared mem instead of MSR it should be using _OSC to indicate CPPCv2 support not this list. This list only needs to exist because the requirement for CPPC support in the _OSC is very recent to the kernel.

Regarding MSR -
boot_cpu_has(X86_FEATURE_CPPC) indicates the MSR support. There shouldn't be any need to maintain a list in this _OSC override check here.


Meanwhile I have the similar issues concern for the coming EPP driver, some systems don`t support EPP and we cannot identify that without IDs list.

That will be localized into the EPP driver source at least.


Perry.