Re: [PATCH net] net: stmmac: xgmac: fix safety error descriptions

From: Serge Semin
Date: Thu Jan 25 2024 - 08:48:42 EST


On Thu, Jan 25, 2024 at 10:34:54AM +0800, Furong Xu wrote:
> On Wed, 24 Jan 2024 17:25:27 +0300
> Serge Semin <fancer.lancer@xxxxxxxxx> wrote:
>
> > On Tue, Jan 23, 2024 at 04:50:37PM +0800, Furong Xu wrote:
> > > Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in
> > > XGMAC core") prints safety error descriptions when safety error assert,
> > > but missed some special errors, and mixed correctable errors and
> > > uncorrectable errors together.
> > > This patch complete the error code list and print the type of errors.
> >
> > The XGMAC ECC Safety code has likely been just copied from the DW GMAC
> > v5 (DW QoS Eth) part. So this change is partly relevant to that code too. I
> > can't confirm that the special errors support is relevant to the DW
> > QoS Eth too (it likely is though), so what about splitting this patch
> > up into two:
> > 1. Elaborate the errors description for DW GMAC v5 and DW XGMAC.
> > 2. Add new ECC safety errors support.
> > ?
> >
> > On the other hand if we were sure that both DW QoS Eth and XGMAC
> > safety features implementation match the ideal solution would be to
> > refactor out the common code into a dedicated module.
> >
> > -Serge(y)
> >
>

> Checked XGMAC Version 3.20a and DW QoS Eth Version 5.20a, the safety error
> code definitions are not identical at all, they do have some differences,
> about more than 20 bits of status register are different.
> I think we should just leave them in individual implementations.

For some reason you answered to the last part of my comment and
completely ignored the first part which was the main point of my
message.

Regarding the Safety Feature support implemented in QoS Eth and XGMAC
STMMAC modules. You were wrong in using the statement "at all". Except
the optional events enable/disable procedure introduced in the commit
5ac712dcdfef ("net: stmmac: enable platform specific safety
features"), there aren't many differences: at least the errors
handling and report are identical, MTL and DMA error flags match, even
MTL and DMA ECC/Safety IRQ flags match. The only difference is in the
MTL/MAC DPP (Data Parity Protection) part which can be easily factored
out based on the device ID should we attempt to refactor the safety
feature code. See the attached html-diff for more details of what
match and what is different.

Anyway I am not insisting on the refactoring. That was just a
proposal, a more preferred alternative to simultaneously patching two
parts of the drivers looking very much alike. Such refactoring would
improve the code maintainability. The main point of my comment was to
extend your patch for DW QoS Eth safety feature implementation too
since some of the changes you introduced were useful for it too, and
in splitting the patch up since your patch added new flags support
which was unrelated change. Thus your patch would turned into the
two-patches patchset like this:
[Patch 1] would provide an elaborated errors description for both DW
QOS Eth (GMAC v5.x) and DW XGMAC.
[Patch 2] would introduce the new ECC safety errors support.

See my further comments about the respective changes.

>
> > >
> > > Fixes: 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core")
> > > Signed-off-by: Furong Xu <0x1207@xxxxxxxxx>
> > > ---
> > > .../ethernet/stmicro/stmmac/dwxgmac2_core.c | 36 +++++++++----------
> > > 1 file changed, 18 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > index eb48211d9b0e..ad812484059e 100644
> > > --- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
> > > @@ -748,29 +748,29 @@ static void dwxgmac3_handle_mac_err(struct net_device *ndev,
> > > }
> > >
> > > static const struct dwxgmac3_error_desc dwxgmac3_mtl_errors[32]= {

> > > - { true, "TXCES", "MTL TX Memory Error" },
> > > + { true, "TXCES", "MTL TX Memory Correctable Error" },

Applicable for both IP-cores
[Patch 1] +QoS, +XGMAC
please apply this change to dwmac5.c too.

> > > { true, "TXAMS", "MTL TX Memory Address Mismatch Error" },

> > > - { true, "TXUES", "MTL TX Memory Error" },
> > > + { true, "TXUES", "MTL TX Memory Uncorrectable Error" },

[Patch 1] +QoS, +XGMAC
ditto

> > > { false, "UNKNOWN", "Unknown Error" }, /* 3 */

> > > - { true, "RXCES", "MTL RX Memory Error" },
> > > + { true, "RXCES", "MTL RX Memory Correctable Error" },

[Patch 1] +QoS, +XGMAC
ditto

> > > { true, "RXAMS", "MTL RX Memory Address Mismatch Error" },

> > > - { true, "RXUES", "MTL RX Memory Error" },
> > > + { true, "RXUES", "MTL RX Memory Uncorrectable Error" },

[Patch 1] +QoS, +XGMAC
ditto

> > > { false, "UNKNOWN", "Unknown Error" }, /* 7 */

> > > - { true, "ECES", "MTL EST Memory Error" },
> > > + { true, "ECES", "MTL EST Memory Correctable Error" },

[Patch 1] +QoS, +XGMAC
ditto

> > > { true, "EAMS", "MTL EST Memory Address Mismatch Error" },

> > > - { true, "EUES", "MTL EST Memory Error" },
> > > + { true, "EUES", "MTL EST Memory Uncorrectable Error" },

[Patch 1] +QoS, +XGMAC
ditto

> > > { false, "UNKNOWN", "Unknown Error" }, /* 11 */

> > > - { true, "RPCES", "MTL RX Parser Memory Error" },
> > > + { true, "RPCES", "MTL RX Parser Memory Correctable Error" },

[Patch 1] +QoS, +XGMAC
ditto

> > > { true, "RPAMS", "MTL RX Parser Memory Address Mismatch Error" },

> > > - { true, "RPUES", "MTL RX Parser Memory Error" },
> > > + { true, "RPUES", "MTL RX Parser Memory Uncorrectable Error" },

[Patch 1] +QoS, +XGMAC
ditto

> > > { false, "UNKNOWN", "Unknown Error" }, /* 15 */

> > > - { false, "UNKNOWN", "Unknown Error" }, /* 16 */
> > > - { false, "UNKNOWN", "Unknown Error" }, /* 17 */
> > > - { false, "UNKNOWN", "Unknown Error" }, /* 18 */
> > > + { true, "SCES", "MTL SGF GCL Memory Correctable Error" },
> > > + { true, "SAMS", "MTL SGF GCL Memory Address Mismatch Error" },
> > > + { true, "SUES", "MTL SGF GCL Memory Uncorrectable Error" },
> > > { false, "UNKNOWN", "Unknown Error" }, /* 19 */
> > > - { false, "UNKNOWN", "Unknown Error" }, /* 20 */
> > > - { false, "UNKNOWN", "Unknown Error" }, /* 21 */
> > > - { false, "UNKNOWN", "Unknown Error" }, /* 22 */
> > > + { true, "RXFCES", "MTL RXF Memory Correctable Error" },
> > > + { true, "RXFAMS", "MTL RXF Memory Address Mismatch Error" },
> > > + { true, "RXFUES", "MTL RXF Memory Uncorrectable Error" },

This introduces the new flags support. Please move this change into a
separate patch (Patch 2):
[Patch 2] +XGMAC only.

My DW QoS Eth v5.10a databook doesn't have these flags defined. If
your 5.20a HW-manual have them described then please add them for DW
QoS Eth too.

> > > { false, "UNKNOWN", "Unknown Error" }, /* 23 */
> > > { false, "UNKNOWN", "Unknown Error" }, /* 24 */
> > > { false, "UNKNOWN", "Unknown Error" }, /* 25 */
> > > @@ -796,13 +796,13 @@ static void dwxgmac3_handle_mtl_err(struct net_device *ndev,
> > > }
> > >
> > > static const struct dwxgmac3_error_desc dwxgmac3_dma_errors[32]= {

> > > - { true, "TCES", "DMA TSO Memory Error" },
> > > + { true, "TCES", "DMA TSO Memory Correctable Error" },

Applicable for both IP-cores
[Patch 1] +QoS, +XGMAC
please apply this change to dwmac5.c too.

> > > { true, "TAMS", "DMA TSO Memory Address Mismatch Error" },

> > > - { true, "TUES", "DMA TSO Memory Error" },
> > > + { true, "TUES", "DMA TSO Memory Uncorrectable Error" },

[Patch 1] +QoS, +XGMAC
ditto

> > > { false, "UNKNOWN", "Unknown Error" }, /* 3 */

> > > - { true, "DCES", "DMA DCACHE Memory Error" },
> > > + { true, "DCES", "DMA DCACHE Memory Correctable Error" },
> > > { true, "DAMS", "DMA DCACHE Address Mismatch Error" },
> > > - { true, "DUES", "DMA DCACHE Memory Error" },
> > > + { true, "DUES", "DMA DCACHE Memory Uncorrectable Error" },

AFAICS applicable for XGMAC only
[Patch 1] +XGMAC only.
Once again, My DW QoS Eth v5.10a databook doesn't have these flags
defined. So if your DW QoS Eth 5.20a HW-manual do have them described
please add them for DW QoS Eth with the elaborated description in the
framework of the Patch 2.

-Serge(y)

> > > { false, "UNKNOWN", "Unknown Error" }, /* 7 */
> > > { false, "UNKNOWN", "Unknown Error" }, /* 8 */
> > > { false, "UNKNOWN", "Unknown Error" }, /* 9 */
> > > --
> > > 2.34.1
> > >
> > >
>