Re: [PATCH v5 02/20] EDAC/synopsys: Fix generic device type detection procedure

From: Borislav Petkov
Date: Mon Jun 10 2024 - 04:01:21 EST


On Wed, Jun 05, 2024 at 01:11:27AM +0300, Serge Semin wrote:
> As I said because dev_type is the memory DRAM chips type (individual
> DRAM chip data bus width), and not the entire DQ-bus width or its
> currently active part. Even from that perspective the function name
> and the subsequent return value utilization is incorrect.

Well, maybe the author misunderstood it but the result of this goes to
sysfs:

dimm->dtype = p_data->get_dtype(priv->baseaddr);

which is in Documentation/ABI/testing/sysfs-devices-edac:

What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_dev_type
Date: April 2012
Contact: Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>
linux-edac@xxxxxxxxxxxxxxx
Description: This attribute file will display what type of DRAM device is
being utilized on this DIMM (x1, x2, x4, x8, ...).

So you'd need to fix the comment above zynqmp_get_dtype() or I can do so
too while applying.

> First of all, not that much of the kinds.

What does that mean: "not that much of the kinds"?

> Just Xilinx ZynqMP DDRC (based on the DW uMCTL 2.40a IP-core) and some
> version of DW uMCTL 3.80a being possessed by Dinh Nguyen and, by
> a lucky coincident, turned to be mainly compatibly with the Xilinx
> ZynqMP DDR controller.

Then Dinh better holler here what the story is.

> > > 32 or 64. At the same time the bus width mode (MSTR.data_bus_width)
> > > doesn't change the ECC feature availability. Thus it was wrong to
> > > determine the ECC state with respect to the DQ-bus width mode.
>
> Sorry, but this part doesn't miss anything.

Gramatically:

"The IP-core reference manual says in [1,2] that the ECC support can't
be enabled during the IP-core synthesizes for the DRAM data bus widths
other than 16,..."

"synthesizes" looks wrong.

It either needs to be

"... be enabled *while* the IP-core synthesizes for the DRAM..." which
still doesn't make too much sense.

Or

"... be enabled during the IP-core *synthesis* for the DRAM..."

I don't know what you mean with that "synthesizes" thing.

> First of all, MSTR.data_bus_width field can have only one of the next
> three values: 0x1, 0x2 and 0x3. All of them are handled in
> zynqmp_get_dtype(). So in the current (incorrect) implementation it
> will never return DEV_UNKNOWN.
>
> Secondly, dimm->dtype isn't utilized for something significant in the
> EDAC subsystem, but is just exposed to the user-space via the dev_type
> sysfs node.

See above.

> So based on that my bet is that since the incorrect code didn't affect
> the main driver functionality and since the dimm->dtype is just
> exposed to user-space, the bug has been living just fine unnoticed up
> until I started digging into the original DW uMCTL2 HW-manuals,
> started studying the driver code, and decided to convert the driver to
> supporting generic version of the DW uMCTL2 controller (not only the
> Xilinx version of it). That's what this series and the next two ones
> are about - about converting the driver to supporting truly generic DW
> uMCTL controllers.

I absolutely don't have a problem with that - good idea!

However, we don't break machines and don't introduce regressions.

> > Can those be freely accessed?
> >
> > If not, you should say so.
>
> No, they can't be.

Then you don't need to mention them.

>
> >
> > > Fixes: b500b4a029d5 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller")
> >
> > So this commit is in 4.20.
> >
> > Does that mean that this fix needs to get backported to all stable
> > kernels?
>
> It's up to the stable maintainers to decide.

Haha, you're funny. How can the stable maintainers know whether each
patch that has Fixes: tags is stable material?

Nope, that's up to the maintainer to decide.

> I've tested it on the devices with DW uMCTL 2.51a + DDR3 memory and DW
> uMCTL 3.10a + DDR4 memory. I am sure this will work for Xilinx ZynqMP
> too, especially seeing we've already got the Shubhrajyoti Datta Rb
> tag:

Yes, I've asked him to review that driver because this is not something
I have or use and so on...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette