Re: k10temp: ZEN3 readings are broken

From: Wei Huang
Date: Mon Dec 21 2020 - 23:35:09 EST




On 12/21/20 9:58 PM, Guenter Roeck wrote:
Hi,

On 12/21/20 5:45 PM, Gabriel C wrote:
Hello Guenter,

while trying to add ZEN3 support for zenpower out of tree modules, I find out
the in-kernel k10temp driver is broken with ZEN3 ( and partially ZEN2 even ).

commit 55163a1c00fcb526e2aa9f7f952fb38d3543da5e added:

case 0x0 ... 0x1: /* Zen3 */

however, this is wrong, we look for a model which is 0x21 for ZEN3,
these seem to
be steppings?

These are model numbers for server CPUs. I believe 0x21 is for desktop CPUs. In other words, current upstream code doesn't support your CPUs. You are welcomed to add support for 0x21, but it is wrong to remove support for 0x00/0x01.


Also, PLANE0/1 are wrong too, Icore has zero readouts even when fixing
the model.

Looking at these ( there is something missing for 0x71 ZEN2 Ryzens
also ) that should be:

PLANE0 (ZEN_SVI_BASE + 0x10)
PLANE1 (ZEN_SVI_BASE + 0xc)

Same problem here with model 0x71. 0x31 is for server CPUs.


Which is the same as for ZEN2 >= 0x71. Since this is not really
documented and I have some
confirmations of these numbers from *somewhere* :-) I created a demo patch only.

I would like AMD people to really have a look at the driver and
confirm the changes, since
getting information from *somewhere*, dosen't mean they are 100%
correct. However, the driver
is working with these changes.

In any way the model needs changing to 0x21 even if we let the other
readings broken.

There is my demo patch:

https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch

For family 19h, the patch should look like. But this might not matter anymore as suggested by Guenter below.

/* F19h thermal registers through SMN */
#define F19H_M01_SVI_TEL_PLANE0 (ZEN_SVI_BASE + 0x14)
#define F19H_M01_SVI_TEL_PLANE1 (ZEN_SVI_BASE + 0x10)
+/* Zen3 Ryzen */
+#define F19H_M21H_SVI_TEL_PLANE0 (ZEN_SVI_BASE + 0x10)
+#define F19H_M21H_SVI_TEL_PLANE1 (ZEN_SVI_BASE + 0xc)

Then add the following change:

switch (boot_cpu_data.x86_model) {
case 0x0 ... 0x1: /* Zen3 */
data->show_current = true;
data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;
data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
k10temp_get_ccd_support(pdev, data, 8);
+ case 0x21: /* Zen3 */
+ data->show_current = true;
+ data->svi_addr[0] = F19H_M21H_SVI_TEL_PLANE0;
+ data->svi_addr[1] = F19H_M21H_SVI_TEL_PLANE1;
+ data->cfactor[0] = F19H_M01H_CFACTOR_ICORE;
+ data->cfactor[1] = F19H_M01H_CFACTOR_ISOC;
+ k10temp_get_ccd_support(pdev, data, 8);


Also, there is some discuss and testing for both drivers:

https://github.com/ocerman/zenpower/issues/39


Thanks for the information. However, since I do not have time to actively maintain
the driver, since each chip variant seems to use different addresses and scales,
and since the information about voltages and currents is unpublished by AMD,
I'll remove support for voltage/current readings from the upstream driver.
I plan to send the patch doing that to Linus shortly after the commit window
closes (or even before that).

I believe Guenter is talking about https://www.spinics.net/lists/linux-hwmon/msg10252.html.


Thanks,
Guenter