Re: [PATCH v5 02/20] EDAC/synopsys: Fix generic device type detection procedure

From: Serge Semin
Date: Tue Jun 04 2024 - 18:11:40 EST


On Tue, Jun 04, 2024 at 08:38:15PM +0200, Borislav Petkov wrote:
> On Thu, Feb 22, 2024 at 09:12:47PM +0300, Serge Semin wrote:
> > First of all the enum dev_type constants describe the memory DRAM chips
> > used at the stick, not the entire DQ-bus width (see the enumeration kdoc
>
> Which kdoc?
>
> The kernel doc above enum dev_type in include/linux/edac.h?

Right.

>
> In any case, you need to be precise pls.
>
> > for details). So what is returned from the zynqmp_get_dtype() function and
> > then specified to the dimm_info->dtype field is definitely incorrect.
>
> Whoops, you lost me here. Why is it incorrect?

As I said because dev_type is the memory DRAM chips type (individual
DRAM chip data bus width), and not the entire DQ-bus width or its
currently active part. Even from that perspective the function name
and the subsequent return value utilization is incorrect.

Imagine the Xilinx ZynqMP has the 64-bit DQ-bus width. Having
(MSTR.data_bus_width == DDRCTL_EWDTH_16) means that only a quarter of
the bus will be utilized to get data from the DRAMs. So all the
connected DRAM chip(s) data buses are _somehow_ distributed along the
16 bits of the DRAM controller DQ-bus. It can be a single DRAM chip
with 16-bit DQ-bus, or two x8 DRAM chips, or four x4 DRAM chips, etc.

>
> You want
>
> "zynqmp_get_dtype - Return the controller memory width."
>
> to return the memory width supported by the controller?
>
> dimm->dtype = p_data->get_dtype(priv->baseaddr);
>
> enum dev_type dtype; /* memory device type */
>
> Yeah, no, that function returns the DIMM device type.
>
> /me looks at the code.
>
> Aha, so you mean the device width should be determined from that
> DDRC_MSTR_CFG* thing.

That's what is said in "Secondly" and "Thirdly".

>
> > Secondly the DRAM chips type has nothing to do with the data bus width
> > specified in the MSTR.data_bus_width CSR field. That CSR field just
> > determines the part of the whole DQ-bus currently used to access the data
> > from the all DRAM memory chips. So it doesn't indicate the individual
> > chips type. Thirdly the DRAM chips type can be determined only in case of
> > the DDR4 protocol by means of the MSTR.device_config field state (it is
>

> Hold on, this driver runs on all kinds of hardware I presume. Are you
> thinking about older ones which don't do DDR4?
>
> Or does that thing do DDR4 only?

First of all, not that much of the kinds. Just Xilinx ZynqMP DDRC
(based on the DW uMCTL 2.40a IP-core) and some version of DW uMCTL
3.80a being possessed by Dinh Nguyen and, by a lucky coincident, turned
to be mainly compatibly with the Xilinx ZynqMP DDR controller.

Secondly I've checked that part on all the DW uMCTL2 databooks I've
got (I've got lots: versions 1.x, 2.x and 3.x). DW uMCTL2 v1.x doesn't
support DDR4. DW uMCTL2 v2.x and v3.x IP-cores do support DDR4 but
work as I described: the only way to determine the DRAM chips type is
to use the MSTR.device_config field content and for DDR4 only.
MSTR.data_bus_width field has nothing to do with that.

>
> > supposed to be set by the system firmware).

> > Finally the DW uMCTL2 DDRC ECC
> > capability doesn't depend on the memory chips type. Moreover it doesn't
> > depend on the utilized data bus width in runtime either. The IP-core
> > reference manual says in [1,2] that the ECC support can't be enabled
> > during the IP-core synthesizes for the DRAM data bus widths other than 16,
>
> This sentence is missing something.
>
> > 32 or 64. At the same time the bus width mode (MSTR.data_bus_width)
> > doesn't change the ECC feature availability. Thus it was wrong to
> > determine the ECC state with respect to the DQ-bus width mode.

Sorry, but this part doesn't miss anything. It merely says that
neither memory DRAM chips type nor MSTR.data_bus_width value could be
utilized to determine the ECC support. According to the databooks the
ECC support can be available on the IP-cores which _full_ DQ-bus width
is 16, 32 or 64. MSTR.data_bus_width, selecting the active part of the
full DQ-bus, doesn't change the ECC feature availability in anyway.
>From that perspective the zynqmp_get_dtype() utilization in
zynqmp_get_ecc_state() has also been incorrect.

>
> You need to split your paragraphs with newlines to help readability.
> Right now it is a blob of hard to parse text. For example, when you have
> to write "Secondly, " that's your split right there. "Thirdly," is your
> next newline. And so on.

Ok.

>
> > Fix all of the mistakes described above in the zynqmp_get_dtype() and
> > zynqmp_get_ecc_state() methods: specify actual DRAM chips data width only
> > for the DDR4 protocol and return that it's UNKNOWN in the rest of the
> > cases;
>

> What are the rest of the cases and why is it ok to return UNKNOWN all of
> a sudden? IOW, how was the old code even tested?!

First of all, MSTR.data_bus_width field can have only one of the next
three values: 0x1, 0x2 and 0x3. All of them are handled in
zynqmp_get_dtype(). So in the current (incorrect) implementation it
will never return DEV_UNKNOWN.

Secondly, dimm->dtype isn't utilized for something significant in the
EDAC subsystem, but is just exposed to the user-space via the dev_type
sysfs node.

So based on that my bet is that since the incorrect code didn't affect
the main driver functionality and since the dimm->dtype is just
exposed to user-space, the bug has been living just fine unnoticed up
until I started digging into the original DW uMCTL2 HW-manuals,
started studying the driver code, and decided to convert the driver to
supporting generic version of the DW uMCTL2 controller (not only the
Xilinx version of it). That's what this series and the next two ones
are about - about converting the driver to supporting truly generic DW
uMCTL controllers.

>
> > determine ECC availability by the ECCCFG0.ecc_mode field state
> > only (that field can't be modified anyway if the IP-core was synthesized
> > with no ECC support).
> >
> > [1] DesignWare® Cores Enhanced Universal DDR Memory Controller (uMCTL2)
> > Databook, Version 3.91a, October 2020, p. 421.
> > [2] DesignWare® Cores Enhanced Universal DDR Memory Controller (uMCTL2)
> > Databook, Version 3.91a, October 2020, p. 633.
>
> Can those be freely accessed?
>
> If not, you should say so.

No, they can't be.

>
> > Fixes: b500b4a029d5 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller")
>
> So this commit is in 4.20.
>
> Does that mean that this fix needs to get backported to all stable
> kernels?

It's up to the stable maintainers to decide.

>
> Have you tested this on all hw this driver supports and made sure no
> regressions are introduced?

I've tested it on the devices with DW uMCTL 2.51a + DDR3 memory and DW
uMCTL 3.10a + DDR4 memory. I am sure this will work for Xilinx ZynqMP
too, especially seeing we've already got the Shubhrajyoti Datta Rb
tag:
https://lore.kernel.org/linux-edac/CAKfKVtErVuCM+pa1e7Lwt0DUU-t-U0eNRnZSw39pfsZ8gv8QZQ@xxxxxxxxxxxxxx/

-Serge(y)

>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette