RE: [PATCH 1/3] x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types.

From: Chatradhi, Naveen Krishna
Date: Mon May 24 2021 - 12:43:23 EST


[AMD Official Use Only]

Hi Boris

My apologies for delayed response. Thanks for your review comments, will submit a v2 shortly.

Regards,
Naveenk

-----Original Message-----
From: Borislav Petkov <bp@xxxxxxxxx>
Sent: Tuesday, May 11, 2021 10:57 PM
To: Chatradhi, Naveen Krishna <NaveenKrishna.Chatradhi@xxxxxxx>
Cc: linux-edac@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; mingo@xxxxxxxxxx; mchehab@xxxxxxxxxx; M K, Muralidhara <Muralidhara.MK@xxxxxxx>
Subject: Re: [PATCH 1/3] x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types.

[CAUTION: External Email]

On Tue, May 11, 2021 at 08:55:36PM +0530, Naveen Krishna Chatradhi wrote:
> diff --git a/arch/x86/kernel/cpu/mce/amd.c
> b/arch/x86/kernel/cpu/mce/amd.c index e486f96b3cb3..055f3a0acf5e
> 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -90,6 +90,7 @@ static struct smca_bank_name smca_names[] = {
> [SMCA_CS_V2] = { "coherent_slave", "Coherent Slave" },
> [SMCA_PIE] = { "pie", "Power, Interrupts, etc." },
> [SMCA_UMC] = { "umc", "Unified Memory Controller" },
> + [SMCA_UMC_V2] = { "umc_v2", "Unified Memory Controller" },

So this is called "umc_v2" but the other V2 FUs's strings are the same.
Why?
[naveenk:] There is a possibility for a heterogenous system with both the SMCA_UMC and SMCA_UMC_V2 variant of controllers to exist.
I will update the long name to describe accordingly.

Also, if you're going to repeat strings, you can just as well group all those which are the same this way:

[ SMCA_UMC ... SMCA_UMC_V2 ] = { "umc", "Unified Memory Controller" },

and do that for all which have V1 and V2.

I mean, gcc is smart enough to do that behind the scenes for identical strings but you should do that in C too.
[naveenk:] thanks for the suggestion, I can do this for the other units.

> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index
> 5dd905a3f30c..5515fd9336b1 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -323,6 +323,21 @@ static const char * const smca_umc_mce_desc[] = {
> "AES SRAM ECC error",
> };
>
> +static const char * const smca_umc2_mce_desc[] = {

Ok, gcc reuses the identical string pointers from smca_umc_mce_desc[] so we should be ok wrt duplication.

> + "DRAM ECC error",
> + "Data poison error",
> + "SDP parity error",
> + "Reserved",
> + "Address/Command parity error",
> + "Write data parity error",
> + "DCQ SRAM ECC error",
> + "Reserved",
> + "Read data parity error",
> + "Rdb SRAM ECC error",
> + "RdRsp SRAM ECC error",
> + "LM32 MP errors",
> +};

...


> +static const char * const smca_xgmipcs_mce_desc[] = {
> + "DataLossErr",
> + "TrainingErr",
> + "FlowCtrlAckErr",
> + "RxFifoUnderflowErr",
> + "RxFifoOverflowErr",
> + "CRCErr",
> + "BERExceededErr",
> + "TxVcidDataErr",
> + "ReplayBufParityErr",
> + "DataParityErr",
> + "ReplayFifoOverflowErr",
> + "ReplayFIfoUnderflowErr",
> + "ElasticFifoOverflowErr",
> + "DeskewErr",
> + "FlowCtrlCRCErr",
> + "DataStartupLimitErr",
> + "FCInitTimeoutErr",
> + "RecoveryTimeoutErr",
> + "ReadySerialTimeoutErr",
> + "ReadySerialAttemptErr",
> + "RecoveryAttemptErr",
> + "RecoveryRelockAttemptErr",
> + "ReplayAttemptErr",
> + "SyncHdrErr",
> + "TxReplayTimeoutErr",
> + "RxReplayTimeoutErr",
> + "LinkSubTxTimeoutErr",
> + "LinkSubRxTimeoutErr",
> + "RxCMDPktErr",

What happened to those and why aren't they proper words like the other error descriptions?
[naveenk:] Will change these into proper words.

Thx.

--
Regards/Gruss,
Boris.

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7C9159e5c1aebd47969c2508d914a20789%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637563508427766424%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2FiHUZDkg99NnGdDrOCK%2FQWsui2yA1dADCfG%2F4xFr%2B7I%3D&amp;reserved=0